Skip to main content.
home | support | download

Back to List Archive

Re: [SWISH-E:429] Re: Indexing/Searching for Plurals

From: Frank Heasley <DrHeasley(at)not-real.chemistry.com>
Date: Tue Aug 11 1998 - 16:13:53 GMT
At 08:48 AM 8/11/98 -0700, Paul Lucas wrote:
>On Tue, 11 Aug 1998, Mark Gaulin wrote:
>
>> Is there a good way to handle indexing & searching for plurals?
>> I would like "motors" and "motor" to be the same. Tips?
>
>	This process is called "stemming": to find the "stem" of a word
>	and index based on that.  If the speed of performing this
>	process by the Excite search engine is typical, it's a VERY
>	slow process.  You also need lots of data (stemming tables) that
>	know about the human language you are stemming:
>
>		houses -> house
>		housing -> house
>		teeth -> tooth
>		...
>
>	Personally, I don't like search engines I use to do stemming at
>	all.  I suppose Joe Sixpack might like them since he isn't used
>	to thinking about things in the precise manner of programmer
>	types and he expects computers to be "smart"; however, he often
>	gets far more documents returned than he knows what to do with.
>
>	In contrast, when programmer types enter queries, they are
>	precise.  For example, if I'm trying to find a document that
>	really only has the word "house" in it (and not "houses") then,
>	when I enter "house" that's what I *really* want the search
>	engine to look for and no more: if I wanted "house" or "houses"
>	then that's what I would have entered.
>
>	- Paul
>
>
Hi Paul,

I think this is the focal point of one of the largest problems in search
engine design.

Most folks who use search engines aren't programmers.  (They aren't Joe
Sixpack either, but that's a different story).

Since computers are designed to be intelligent machines, it is reasonable
to expect them to be able to do things like stemming *if you want them to*.

Thus, most database software that is designed for general use has the
option of turning stemming on or off.

SWISH is one of the easier search engines to set up, so it tends to get
installed in lots of places where the general public is expected to use it.

Unfortunately, the general public is not sophisticated enough (and probably
never will be) to understand the problems that can arise with SWISH.

Programmers who want their systems to be useful need to understand these
foibles, and to understand how to use the intelligence of these powerful
machines to compensate for them.

I believe that the ability to write user friendly software is the true mark
of expertise in programming.

Frank
Received on Tue Aug 11 09:16:03 1998