I was just looking at source code for WAIS and they have a "Stem()"
function based on the "Porter" algorithm, which I assume is for English
Porter, M.F., "An Algorithm For Suffix Stripping,"
Program 14 (3), July 1980, pp. 130-137.
Since swish does not have anything like this I'll take a look and see
how compatible the two algorithms are. If it works then the thing to
do would be to allow for *optional* stemming during the index and
At 09:19 AM 8/11/98 -0700, you wrote:
>At 08:48 AM 8/11/98 -0700, Paul Lucas wrote:
>>On Tue, 11 Aug 1998, Mark Gaulin wrote:
>>> Is there a good way to handle indexing & searching for plurals?
>>> I would like "motors" and "motor" to be the same. Tips?
>> This process is called "stemming": to find the "stem" of a word
>> and index based on that. If the speed of performing this
>> process by the Excite search engine is typical, it's a VERY
>> slow process. You also need lots of data (stemming tables) that
>> know about the human language you are stemming:
>> houses -> house
>> housing -> house
>> teeth -> tooth
>> Personally, I don't like search engines I use to do stemming at
>> all. I suppose Joe Sixpack might like them since he isn't used
>> to thinking about things in the precise manner of programmer
>> types and he expects computers to be "smart"; however, he often
>> gets far more documents returned than he knows what to do with.
>> In contrast, when programmer types enter queries, they are
>> precise. For example, if I'm trying to find a document that
>> really only has the word "house" in it (and not "houses") then,
>> when I enter "house" that's what I *really* want the search
>> engine to look for and no more: if I wanted "house" or "houses"
>> then that's what I would have entered.
>> - Paul
>I think this is the focal point of one of the largest problems in search
>Most folks who use search engines aren't programmers. (They aren't Joe
>Sixpack either, but that's a different story).
>Since computers are designed to be intelligent machines, it is reasonable
>to expect them to be able to do things like stemming *if you want them to*.
>Thus, most database software that is designed for general use has the
>option of turning stemming on or off.
>SWISH is one of the easier search engines to set up, so it tends to get
>installed in lots of places where the general public is expected to use it.
>Unfortunately, the general public is not sophisticated enough (and probably
>never will be) to understand the problems that can arise with SWISH.
>Programmers who want their systems to be useful need to understand these
>foibles, and to understand how to use the intelligence of these powerful
>machines to compensate for them.
>I believe that the ability to write user friendly software is the true mark
>of expertise in programming.
Received on Tue Aug 11 10:28:30 1998