Skip to main content.
home | support | download

Back to List Archive

Re: Snowball stemmers

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Jan 12 2006 - 18:09:21 GMT
On Thu, Jan 05, 2006 at 08:45:09AM -0800, Bruusgaard, Jan wrote:
> 
> Is it possible for someone to update the Snowball stemming in Swish-e?
> 
> I downloaded latest nightly build of Swish-e, and tried to change
> stem_no.c with a new stem_no.c from Snowball, because the norwegian
> algortithm here has been improved.
> 
> But Snowball has changed their API after they started supporting UTF-8,
> so it seems to be some work here for someone with C programming skills.

Sorry for the delay in responding.  It would likely be a while before
I could take a look at this -- and I suspect the other developers
are busy, too.  If you could get an initial patch working then it
would likely find it's way into swish a lot sooner.

I have not looked at the new API, but if it requires utf8 on input then
we would need to use iconv to convert from swish-e's 8859-1 to utf8
and back again.  Look also at parser.c to see how swish uses libxml2's
method to convert utf8 to latin1.

Maybe someone else on the list with C skills would be interested in
helping?  Anyone?

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Thu Jan 12 10:09:24 2006