Skip to main content.
home | support | download

Back to List Archive

Re: Problems indexing german umlauts

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri May 07 2004 - 17:39:09 GMT
On Fri, May 07, 2004 at 07:45:51AM -0700, Sven Schupp wrote:
> >>As an example, we have a word "Überbrückungsgeld".
> >>
> >>If I try to search for "Überbrückungsgeld" swish-e gives me no results. 
> >>But if I search for "überbrückungsgeld" it'll give me a list with all 
> >>hits. Surprisingly all occurrances of this word have an big "Ü" as the
> >>first char!

moseley@bumby:~$ cat c
WordCharacters abcdefghijklmnopqrstuvwxyzÜü
BeginCharacters abcdefghijklmnopqrstuvwxyzÜü
EndCharacters abcdefghijklmnopqrstuvwxyzÜü

moseley@bumby:~$ cat uber

Überbrückungsgeld


moseley@bumby:~$ swish-e -c c -i uber -T indexed_words -v0
    Adding:[1:swishdefault(1)]   'überbrückungsgeld'   Pos:5  Stuct:0x9 ( BODY FILE )

(notice that it was converted to lowercase)

moseley@bumby:~$ echo $LANG 
en_US

(same thing happens if I change my $LANG

moseley@bumby:~$ echo $LANG
de_DE

moseley@bumby:~$ swish-e -c c -i uber -T indexed_words -v0
    Adding:[1:swishdefault(1)]   'überbrückungsgeld'   Pos:5  Stuct:0x9 ( BODY FILE )


(search uppercase:)

moseley@bumby:~$ swish-e -w Überbrückungsgeld
# SWISH format: 2.5.1
# Search words: Überbrückungsgeld
# Removed stopwords: 
# Number of hits: 1
# Search time: 0.001 seconds
# Run time: 0.055 seconds
1000 uber "uber" 19
.

(and lower:)

moseley@bumby:~$ swish-e -w überbrückungsgeld
# SWISH format: 2.5.1
# Search words: überbrückungsgeld
# Removed stopwords: 
# Number of hits: 1
# Search time: 0.001 seconds
# Run time: 0.049 seconds
1000 uber "uber" 19
.

Can you repeat the above?

-- 
Bill Moseley
moseley@hank.org
Received on Fri May 7 10:39:12 2004