Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Snowball stemmers

From: Trygve Falch <trygu(at)>
Date: Thu Oct 04 2007 - 12:04:33 GMT
On Wed, 2007-10-03 at 20:56 -0500, Peter Karman wrote:

> IIRC, stemming happens after tokenization in 2.4.x, so it makes more sense to me 
> to port the old Russion stemmer to fit the newer Snowball API. Too much 
> lossy-ness otherwise, going from UTF-8 (parsing) to ISO-8859-x (tokenizing) to 
> UTF-8 (stemming) to ISO-8859-x (storage).

I had missed the russian stemmer for KOI8. Which is, as far as I could
tell is the same stemmer that was used in the old snowball.

I have also added additional languages for Hungarian and Romanian, and
of course the russian stemmer.

How would you like the patches? I have made patches against SVN trunk.
Do you want to commit it, or should I get commit rights first?

Trygve Falch

Users mailing list
Received on Thu Oct 4 08:04:34 2007