Thanks for the example Bill, but it didn't work.
> cat test.xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE order SYSTEM "pedido.dtd">
<Idioma tipo="Castellano">
árbol ÁRBOL empeño. EMPEÑO.
</Idioma>
swish-e-2.4.3> LANG=es_ES swish-e -c swish-e.conf -T indexed_words
Adding automatic MetaName 'descripcion' found in file test.xml
Adding:[2:idioma(10)] 'empe' Pos:6 Stuct:0x1 ( FILE )
Adding:[2:descripcion(12)] 'empe' Pos:6 Stuct:0x1 ( FILE )
Adding:[2:idioma(10)] 'empe' Pos:7 Stuct:0x1 ( FILE )
Adding:[2:descripcion(12)] 'empe' Pos:7 Stuct:0x1 ( FILE
.
swish-e-2.4.3> LANG=es_ES swish-e -w PESTAÑA -H9 | grep Parsed
# Parsed Words: pesta
dsorian@linux:~/swish-e-2.4.3> LANG=es_ES swish-e -w pestaña
locale es es_ES@@@@@@@@@@@@
# SWISH format: 2.4.3
# Search words: pestaña
# Removed stopwords:
err: no results
Did I make something wrong? I used LANG=es_ES but it's worse,
swish-e splits the words in the ñ's. So the problem isn't in the
locale settings (I think). Any suggestion?
Thank you.
> On Wed, Feb 16, 2005 at 06:46:41AM -0800, dasoso@alumni.uv.es
wrote:
> > 1.-Here ara my locale settings, could be the reason because
swish-e
> > indexes ÁRBOL as Árbol?
>
> Yes, that what I was suggesting.
>
> Swish-e is converting your text to 8858-1 encoding, but you are
> telling it to sort using UTF-8.
>
> Run swish like this:
>
> LANG=es_ES swish-e -c config
>
> Maybe a demonstration will make it clear:
>
> moseley@bumby:~$ LANG=es_ES.UTF-8 swish-e -i t.txt -T
indexed_words -v0
> Adding:[1:swishdefault(1)] 'pestaÑa' Pos:5 Stuct:0x9
( BODY FILE )
> Adding:[1:swishdefault(1)] 'Águila' Pos:6 Stuct:0x9
( BODY FILE )
> Adding:[1:swishdefault(1)] 'águila' Pos:7 Stuct:0x9
( BODY FILE )
> moseley@bumby:~$ LANG=es_ES swish-e -i t.txt -T indexed_words -v0
> Adding:[1:swishdefault(1)] 'pestaña' Pos:5 Stuct:0x9
( BODY FILE )
> Adding:[1:swishdefault(1)] 'águila' Pos:6 Stuct:0x9
( BODY FILE )
> Adding:[1:swishdefault(1)] 'águila' Pos:7 Stuct:0x9
( BODY FILE )
>
> And you will need to search that way, too -- or at least be
> consistent that your locale setting is the same when indexing and
> when searching so that tolower() operates the same when when
> searching as it does when indexing. But the bottom line is you
don't
> want to tell tolower() that it's working with UTF-8 encoding when
> it's really working with 8859-1 encoding.
>
>
> moseley@bumby:~$ LANG=es_ES.UTF-8 swish-e -w PESTAÑA -H9 | grep
Parsed
> # Parsed Words: pestaÑa
> moseley@bumby:~$ LANG=es_ES swish-e -w PESTAÑA -H9 | grep Parsed
> # Parsed Words: pestaña
>
>
> We could force LANG at program startup, but there's more than one
> valid setting (i.e. en_US de_DE es_ES) so we want people to be
able
> to set that.
>
>
>
>
> --
> Bill Moseley
> moseley@hank.org
>
> Unsubscribe from or help with the swish-e list:
> http://swish-e.org/Discussion/
>
> Help with Swish-e:
> http://swish-e.org/current/docs
> swish-e@sunsite.berkeley.edu
>
>
Received on Sat Feb 19 13:09:03 2005