Skip to main content.
home | support | download

Back to List Archive

Re: Indexing umlauts

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Dec 13 2005 - 01:39:50 GMT
On Mon, Dec 12, 2005 at 12:03:22PM -0800, Thomas Nyman wrote:
> I made a word document for testing.
> The document contains the following two word
> 
> Överskottslager
> 
> boy
> 
> when i run swish-e -c swish_se.conf -i test.doc -T indexed_words -v0
> 
> i get the following
> 
> Adding:[1:swishdocpath(11)]   'test'   Pos:1  Stuct:0x1 ( FILE )
>      Adding:[1:swishdocpath(11)]   'doc'   Pos:2  Stuct:0x1 ( FILE )
>      Adding:[1:swishdefault(1)]   'a'   Pos:1  Stuct:0x1 ( FILE )
>      Adding:[1:swishdefault(1)]   'verskottslager'   Pos:2  Stuct:0x1  
> ( FILE )
>      Adding:[1:swishdefault(1)]   'boy'   Pos:3  Stuct:0x1 ( FILE )

Odd, works for me.

moseley@bumby:~$ cat word
Överskottslager
boy
moseley@bumby:~$ swish-e -i word -T indexed_words -v0
    Adding:[1:swishdefault(1)]   'överskottslager'   Pos:5  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'boy'   Pos:6  Stuct:0x9 ( BODY FILE )

moseley@bumby:~$ cat c
TranslateCharacters :ascii7:
moseley@bumby:~$ swish-e -i word -T indexed_words -c c  -v0
    Adding:[1:swishdefault(1)]   'overskottslager'   Pos:5  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'boy'   Pos:6  Stuct:0x9 ( BODY FILE )


Is it possible your config or source file is in a different encoding?
Doesn't seem likely, but I can't think of why it wouldn't be working.
I just cut from your email so seems like it would be the same
encoding.


moseley@bumby:~$ od -t x1c  word
0000000 d6 76 65 72 73 6b 6f 74 74 73 6c 61 67 65 72 0a
          Ö   v   e   r   s   k   o   t   t   s   l   a   g   e   r  \n
0000020 62 6f 79 0a
          b   o   y  \n

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Mon Dec 12 17:39:58 2005