Hi Jose
I have been testing the Norwegian stemmer a bit more, but it does not seem to work right in every case.
Is there some kind of limit for the length of words?
In my swish-e config file I have set: MaxWordLimit 40. I also downloaded the 2003-07-01 version
If you look at the examples below the stemmer should have removed the suffix en.
In example 3 the stemming works right. Number of hits are 461 in both 3 and 4. That is not the case in example 1.
The Norwegian Snowball page shows the right stemming for the word "konsumprisindeksen".
Jan
--- output ---
1
# SWISH format: 2.4.0-pr1
# Search words: konsumprisindeks
# Removed stopwords:
# Number of hits: 90
# Search time: 0.001 seconds
# Run time: 0.032 seconds
1000 http://www.ssb.no/emner/08/02/10/hkpi/index.html "Konsumprisindeks, harmonisert" 640
2
# Search words: konsumprisindeksen
# Number of hits: 270
1000 http://www.ssb.no/emner/08/02/10/kpi/index.html "Konsumprisindeksen - hovedside" 697
3
# Search words: boligtelling
# Number of hits: 461
1000 http://www.ssb.no/emner/02/01/fobbolig/index.html "Endelige tall fra boligtellingen. Folke- og boligtellingen 2001" 675
4
# Search words: boligtellingen
# Number of hits: 461
1000 http://www.ssb.no/emner/02/01/fobbolig/index.html "Endelige tall fra boligtellingen. Folke- og boligtellingen 2001" 675
-----Opprinnelig melding-----
Fra: jmruiz@boe.es [mailto:jmruiz@boe.es]
Sendt: 16. juni 2003 18:28
Til: Bruusgaard, Jan
Emne: Re: SV: [SWISH-E] Multilanguage stemmers - norwegian
Forgot to mention...
I have removed in the cvs the annoying messages you
have noticed. Once again, it was my fault. They were
a couple of debug messages. You can delete them
in index.c or update from cvs.
Jose
On 16 Jun 2003 at 16:38, Bruusgaard, Jan wrote:
> Hi.
>
> I installed new version of swish-e, and it seems to work, but I am not
> shure if it is the norwegian stemmer i use. I have to test abit more.
>
> I had to use:
>
> UseStemming yes
> # Put yes to apply word stemming algorithm during indexing,
> # else no. See the manual for info about stemming.
>
> If I use:
>
> UseStemming no
>
> no means no, not norwegian and it is not stemming.
>
>
> When indexing i also get a a lot of these messages:
>
> (...)
> Antes Stemm index.c 0x400ccc68
> Despues Stemm index.c
> Antes Stemm index.c 0x400ccc68
> Despues Stemm index.c
> Antes Stemm index.c 0x400ccc68
> Despues Stemm index.c
> Antes Stemm index.c 0x400ccc68
> Despues Stemm index.c
> Antes Stemm index.c 0x400ccc68
> Despues Stemm index.c
> (...)
>
>
> Jan
>
> -----Opprinnelig melding-----
> Fra: jmruiz@boe.es [mailto:jmruiz@boe.es]
> Sendt: 10. juni 2003 18:19
> Til: Multiple recipients of list
> Emne: [SWISH-E] Multilanguage stemmers
>
>
> Hi,
>
> The rest of the snowball's stemmers has been added to swish
> (no,se,dk,fi,ru?). See previous posts about this issue to see how to
> use them.
>
> Testers around the world are wellcome ;)
>
> cu
> Jose
>
Received on Wed Jul 2 14:27:32 2003