On Tue, 11 Aug 1998, Mark Gaulin wrote:
> Is there a good way to handle indexing & searching for plurals?
> I would like "motors" and "motor" to be the same. Tips?
This process is called "stemming": to find the "stem" of a word
and index based on that. If the speed of performing this
process by the Excite search engine is typical, it's a VERY
slow process. You also need lots of data (stemming tables) that
know about the human language you are stemming:
houses -> house
housing -> house
teeth -> tooth
...
Personally, I don't like search engines I use to do stemming at
all. I suppose Joe Sixpack might like them since he isn't used
to thinking about things in the precise manner of programmer
types and he expects computers to be "smart"; however, he often
gets far more documents returned than he knows what to do with.
In contrast, when programmer types enter queries, they are
precise. For example, if I'm trying to find a document that
really only has the word "house" in it (and not "houses") then,
when I enter "house" that's what I *really* want the search
engine to look for and no more: if I wanted "house" or "houses"
then that's what I would have entered.
- Paul
Received on Tue Aug 11 08:46:52 1998