Trygve Falch wrote on 10/2/07 4:05 AM:
> There are to ways to I
> could solve this; Either introduce UTF-8 stemmers, with the changes
> needed in the swish-e code to accomodate that, or I could port the old
> russian ISO-stemmer to fit the new API.
> Any comments?
IIRC, stemming happens after tokenization in 2.4.x, so it makes more sense to me
to port the old Russion stemmer to fit the newer Snowball API. Too much
lossy-ness otherwise, going from UTF-8 (parsing) to ISO-8859-x (tokenizing) to
UTF-8 (stemming) to ISO-8859-x (storage).
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Users mailing list
Received on Wed Oct 3 21:56:09 2007