Skip to main content.
home | support | download

Back to List Archive

RE: stemming

From: David Norris <dave(at)not-real.webaugur.com>
Date: Fri Nov 19 1999 - 04:36:17 GMT
> search for "rockies", but do contain "rock"?

This is probably correct.  The stemmer simply strips the suffix from a
word.  "Rockies" is "Rocki".  "Supplies" is "Suppli"

> if I search for "rock" instead of "rockies",
> I get hits I did not get the first time.

This is possibly a bug.  The stemmer will fail to match some words
when it really should.  I do not know a proper way around the problem.
It should be possible to rewrite the stemmer to work correctly.
Someone suggested that the word be stemmed until it could not be
further stemmed.  This solution should work, but might make a few
incorrect matches.

If you want to more easily debug the stemmer, or another word filter,
then I have written a wrapper (extract it to your swish-e directory):
http://www.webaugur.com/wares/files/wordtest.tar.gz

The wordtest.mk makefile is configured to use stemmer.c by default.
You can define another function (FILTERNAME macro) in wordtest.c.
Then link wordtest.o to your word filter.

--
,David Norris
  The OpenSA Project - http://www.opensa.de/
  Dave's Web - http://www.webaugur.com/dave/
  ICQ Universal Internet Number - 412039
  E-Mail - dave@webaugur.com
Received on Thu Nov 18 20:40:45 1999