Hi all.
> man locale or use google. You machine's locale determines how it
> sorts, displays money and thousands separator in numbers.
1.-Here ara my locale settings, could be the reason because swish-e
indexes ÁRBOL as Árbol?
x:~> locale
LANG=es_ES.UTF-8
LC_CTYPE="es_ES.UTF-8"
LC_NUMERIC="es_ES.UTF-8"
LC_TIME="es_ES.UTF-8"
LC_COLLATE="es_ES.UTF-8"
LC_MONETARY="es_ES.UTF-8"
LC_MESSAGES="es_ES.UTF-8"
LC_PAPER="es_ES.UTF-8"
LC_NAME="es_ES.UTF-8"
LC_ADDRESS="es_ES.UTF-8"
LC_TELEPHONE="es_ES.UTF-8"
LC_MEASUREMENT="es_ES.UTF-8"
LC_IDENTIFICATION="es_ES.UTF-8"
LC_ALL=
>
> TranslateCharacters is helpful mostly for English speakers where
they
> might want to search for Niño but might type Nino instead. It
> probably not what you need.
>
2.-Ok, but swish-e indexes ÁRBOL as Árbol and árbol as árbol. And
would be useful for me if TranslateCharacters works and swish-e
could index all those words as one word (arbol). Because if I want
to search arbol I would like Árbol ÁRBOL árbol... in the results
too.
How can I make it works?
Example:
cat prueba.html
<html>
<body>
arbol
árbol
ARBOL
ÁRBOL
</body>
</html>
cat test.xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE order SYSTEM "pedido.dtd">
<Idioma tipo="Castellano">
<descripcion>
arbol
árbol
ÁRBOL
ARBOL
</descripcion>
</Idioma>
cat test2.xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE order SYSTEM "pedido.dtd">
<Idioma tipo="Castellano">
<descripcion>
arbol
ARBOL
</descripcion>
</Idioma>
swish-e -c swish-e.conf -T indexed_words
Indexing Data Source: "File-System"
Indexing "/home/dsorian/parabuscar/kk/paraelmail"
Checking dir "/home/dsorian/parabuscar/kk/paraelmail"...
prueba.html - Using HTML parser -
Adding:[1:swishdefault(1)] 'arbol' Pos:1 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'árbol' Pos:2 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'arbol' Pos:3 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'Árbol' Pos:4 Stuct:0x9 ( BODY
FILE )
(4 words)
test.xml - Using XML2 parser -
**Adding automatic MetaName 'idioma' found in file /test.xml'
**Adding automatic MetaName 'idioma.tipo' found in file
ail/test.xml'
Adding:[2:idioma(10)] 'castellano' Pos:3 Stuct:0x1 ( FILE )
Adding:[2:idioma.tipo(11)] 'castellano' Pos:3 Stuct:0x1
( FILE )
**Adding automatic MetaName 'descripcion' found in file
'/home/dsorian/parabuscar/kk/paraelmail/test.xml'
Adding:[2:idioma(10)] 'árbol' Pos:6 Stuct:0x1 ( FILE )
Adding:[2:descripcion(12)] 'árbol' Pos:6 Stuct:0x1 ( FILE )
Adding:[2:idioma(10)] 'Árbol' Pos:7 Stuct:0x1 ( FILE )
Adding:[2:descripcion(12)] 'Árbol' Pos:7 Stuct:0x1 ( FILE )
(3 words)
test2.xml - Using XML2 parser -
Adding:[3:idioma(10)] 'castellano' Pos:3 Stuct:0x1 ( FILE )
Adding:[3:idioma.tipo(11)] 'castellano' Pos:3 Stuct:0x1
( FILE )
Adding:[3:idioma(10)] 'arbol' Pos:6 Stuct:0x1 ( FILE )
Adding:[3:descripcion(12)] 'arbol' Pos:6 Stuct:0x1 ( FILE )
Adding:[3:idioma(10)] 'arbol' Pos:7 Stuct:0x1 ( FILE )
Adding:[3:descripcion(12)] 'arbol' Pos:7 Stuct:0x1 ( FILE )
(3 words)
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 4 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
4 unique words indexed.
4 properties sorted.
3 files indexed. 473 total bytes. 16 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
swish-e -k '*'
# SWISH format: 2.4.3
index.swish-e: arbol castellano Árbol árbol
I would like translate Á and á as a. I would make better the
searches. In the next search I want to get test.xml and test2.xml in
the results.
dsorian@linux:~/swish-e-2.4.3> swish-e -w "idioma=Árbol"
# SWISH format: 2.4.3
# Search words: idioma=Árbol
# Removed stopwords:
# Number of hits: 1
# Search time: 0.001 seconds
# Run time: 0.026 seconds
1000 /home/dsorian/parabuscar/kk/paraelmail/test.xml "test.xml" 217
Thank you and sorry for the big mail :)
Received on Wed Feb 16 06:52:13 2005