Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Snowball stemmers

From: Trygve Falch <trygu(at)>
Date: Tue Oct 02 2007 - 09:05:22 GMT
On Thu Jan 12 2006 - 18:09:21 GMT, Bill Moseley wrote:
> On Thu, Jan 05, 2006 at 08:45:09AM -0800, Bruusgaard, Jan wrote:

> > But Snowball has changed their API after they started supporting UTF-8,
> > so it seems to be some work here for someone with C programming skills.

> I have not looked at the new API, but if it requires utf8 on input then
> we would need to use iconv to convert from swish-e's 8859-1 to utf8
> and back again.  Look also at parser.c to see how swish uses libxml2's
> method to convert utf8 to latin1.

I have made an initial patch against 2.4.5 to use the latest Snowball
API. The API has changed as Jan points out, but the changes is not that

They have introduced UTF-8 stemmers for a whole series of languages
alongside with ISO-8859-1 and ISO-8859-2 versions of most of the
languages, but unfortunatly without ISO-stemmers for Russian. So my
patch is unfortunatly without russian support. There are to ways to I
could solve this; Either introduce UTF-8 stemmers, with the changes
needed in the swish-e code to accomodate that, or I could port the old
russian ISO-stemmer to fit the new API.

Any comments?

Trygve Falch

Users mailing list
Received on Tue Oct 2 05:05:30 2007