Skip to main content.
home | support | download

Back to List Archive

Re: stemming and swish-2.0-beta1

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Jun 28 2000 - 12:30:25 GMT
At 02:27 AM 06/28/00 -0700, Jose Manuel Ruiz wrote:
>Change the following lines in function operate in search.c (line number
1048):

Great, thanks!  That searches as I'd expect.


>This will guarantee good performance.
>
>BTW, wildcards are treated like normal words. The only difference is in
>function getfileinfo:
>- Search for a normal word (no wildcard) is made using a fast hash approach
>- Search for a wildcard word is made using a sequential approach (words are
>sorted in the index file). So, it returns all the data for all the words,
>without using an "or" function, getting all the data at once. 
>For this reason the performace is better.

I'm trying to understand the difference with the old swish.

The old swish used to take a search like "search for run*" and then lookup
all the words in the index that started with run* and convert it int a
search of
"search for (run or runs or running)" and then use that for the query.  (I
don't remember how it then handled that query, though.)

Does 2.0 also first expand "search for run*" into "search for (run or runs
or running)"?

As I remember (perhaps incorrectly), the old swish could quickly find the
start of the words beginning with the letter "r" but then it would walk the
sorted index word by word to find the words that started with "run*".

Thanks again,


Bill Moseley
mailto:moseley@hank.org
Received on Wed Jun 28 08:52:10 2000