Re: Fuzzy Indexing Questions

From: Bill Moseley <moseley(at)>
Date: Thu May 08 2003 - 06:39:52 GMT
On Wed, May 07, 2003 at 05:18:58PM -0700, John Movius wrote:

> Does anyone have any stats on the relative size of a regular SWISH-e
> index vs. a fuzzy SWISH-e index?  I realize this could vary
> considerably.   

Here's another sample of about 10,000 entries using Stemming.

    8170019 May  7 23:06 index.swish-e
    1519304 May  7 23:06 index.swish-e.prop

    8643319 May  7 23:09 index_no_stem.swish-e
    1519304 May  7 23:09 index_no_stem.swish-e.prop

As you can see, not much different.  One bummer is the .prop file is duplicated for each.

Would not be too much of a hack to get swish to create an index that included stemming and 
non-stemming within the same index.  Could just use metanames to store the different 
versions of the same word internally.

Bill Moseley
