On Sun, Jan 28, 2007 at 11:49:20PM -0800, Antony Dovgal wrote:
> According to the documentation, SwishFuzzyWordError() return values are
> defined in src/stemmer.h file, and this is true.
> Though, this fact actually makes it impossible to use these values because
> stemmer.h is not a public header and used only internally.
> Also, it's not really clear if one should use this function or it's not recommended/deprecated/etc.
> The documentation of SwishFuzzyWordError() almost does not shed a light:
> "Not all stemmers set this value correctly." - well, this means at least some of them
> DO return correct values. That's better than nothing.
> Maybe it's time to fix those returning incorrect values?
The stemming code in swish mixes "stemmers" from different sources.
So not all errors apply to all stemmers.
$ fgrep ' STEM_' *.c
soundex.c: fw->error = STEM_WORD_TOO_BIG;
soundex.c: fw->error = STEM_NOT_ALPHA;
soundex.c: fw->error = STEM_TOO_SMALL;
soundex.c: return STEM_OK; /* Hum, probably not right */
stemmer.c: fw->error = STEM_OK; /* default to OK */
stemmer.c: fw->error = STEM_TO_NOTHING;
> "But since SwishFuzzyWordList() will return a valid string regardless of the return value,
> you can often just ignore this setting. That's what I do." - how often should I ignore it? =)
> I mean, if the value of this function should be ignored, then the function itself is useless.
It's not important to swish -- swish just passes in words and if
there's a problem (like the word can't be stemmed) then it uses the
un-stemmed word for indexing and searching.
You might have some need for that error code outside of swish, though
(say to test and flag which words in a query were stemmed).
It's been a long time since I looked at the Snowball API, but looking
at this bit of code:
fi->stemmer->lang_stem(snowball); /* Stem the word */
if ( 0 == snowball->l )
fw->error = STEM_TO_NOTHING;
Shouldn't the return value of calling lang_stem() be tested? Or maybe
testing the length is fine. I'm not sure.
> Hence the question:
> Would you accept a patch exporting those constants to public (and changing the
> function prototype appropriately) or should I forget about SwishFuzzyWordError()?
> See diff against current CVS in attachment.
I think the patch makes sense. I'm not sure why the STEM_RETURNS
struct was not made public.
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Received on Mon Jan 29 08:47:20 2007