Skip to main content.
home | support | download

Back to List Archive

Re: [SWISH-E:425] Indexing/Searching for Plurals

From: Paul J. Lucas <pjl(at)not-real.ptolemy.arc.nasa.gov>
Date: Tue Aug 11 1998 - 15:37:15 GMT
On Tue, 11 Aug 1998, Mark Gaulin wrote:

> Is there a good way to handle indexing & searching for plurals?
> I would like "motors" and "motor" to be the same. Tips?

	This process is called "stemming": to find the "stem" of a word
	and index based on that.  If the speed of performing this
	process by the Excite search engine is typical, it's a VERY
	slow process.  You also need lots of data (stemming tables) that
	know about the human language you are stemming:

		houses -> house
		housing -> house
		teeth -> tooth
		...

	Personally, I don't like search engines I use to do stemming at
	all.  I suppose Joe Sixpack might like them since he isn't used
	to thinking about things in the precise manner of programmer
	types and he expects computers to be "smart"; however, he often
	gets far more documents returned than he knows what to do with.

	In contrast, when programmer types enter queries, they are
	precise.  For example, if I'm trying to find a document that
	really only has the word "house" in it (and not "houses") then,
	when I enter "house" that's what I *really* want the search
	engine to look for and no more: if I wanted "house" or "houses"
	then that's what I would have entered.

	- Paul
Received on Tue Aug 11 08:46:52 1998