Re: Stemming - Varying Results

From: Bill Moseley <moseley(at)>
Date: Tue Oct 04 2005 - 20:40:14 GMT
On Tue, Oct 04, 2005 at 01:11:16PM -0700, Antonio Barrera wrote:
> I am using Stemming_en, a search for "Environmental" includes the 6.xml in
> the results, but not 398.xml.  A search for environment, returns 398.xml,
> but not 6.xml.  In the live version, Environment returns 22 hits,
> Environmental 30.  Shouldn't stemming result in the same number of hits?

Depends on the words and how the stemmer works.

moseley@bumby:~$ cat c
FuzzyIndexingMode stemming_en
moseley@bumby:~$ cat words

moseley@bumby:~$ swish-e -T indexed_words -c c -i words -v0
    Adding:[1:swishdefault(1)]   'environ'   Pos:5  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'environment'   Pos:6  Stuct:0x9 ( BODY FILE )

So the stemmer considers those to be different.

Bill Moseley

Received on Tue Oct 4 13:40:24 2005