I am playing around with a Stem() function from WAIS, and it works
well (for what it is). I think that stemming has to be applied at
index time so that the words in the index are properly stemmed
(actually "de-stemmed", leaving only the root word). Given that the
index contains only root words, the results of a search where the
search terms themselves were not de-stemmed would be lousy.
The words in the index and words that are searched both have to
be de-stemmed. If you put a checkbox like the one shown below
(asking if stemming is requested) then you should maintain two
indexes; one with stemming applied and one without. The version
of Swish-E that I am building enforces the rule that any search terms
applied to a de-stemmed index must themselves be destemmed.
(But an index can be created that is not de-stemmed, and the search
terms applied to it are left alone. Should de-stemming search terms,
regardless of the index's status be allowed? Perhaps...)
Using wildcard searches seems like a partial solution, since there
are two sets of words that are of interest: the words the user wants
to find and the words that the author(s) of the corpus happened to
use. If I could be sure that all of my documents used "motor"
and not "motors" then there would be less of a problem. Since that
is not the case I want to have more control. (This is a weak argument,
I know. Basically, automatic de-stemming is just "easier" to use,
in my opinion.)
I have a working NT version of Swish-E that has the Stem function and also
the "document property" thing. It is a work in progress, but if anyone
wants to try it, let me know and I'll send it along.
Mark
At 04:14 PM 8/11/98 -0700, you wrote:
>On Tue, 11 Aug 1998, Paul J. Lucas wrote:
>
>> And I don't see why a one line instruction such as:
>>
>> Use * after a word for wildcards, e.g.
>> "librar*" to match any one of "library,"
>> "librarian, " or "libraries."
>>
>> isn't understandable even by Joe Sixpack.
>
> If you make stemming optional an *index* time, then the user
> doesn't have a choice and I don't like that. If the *search*
> component is capable of either stemming or not, then you need
> to add a checkbox to your HTML search form:
>
> [_] Perform stemming
>
> but then you have to explain what stemming is. My point is
> that either you explain how to use wildcards as I did above
> -or- you have to explain what stemming is and give the user the
> ability to turn it off.
>
> Moral: there's no such thing as a free lunch.
>
> - Paul
>
Received on Tue Aug 11 17:38:15 1998