Hi,
i play around with Swish-E and french texts the last few days and i've
encountered a problem that i can't solve.
I'm indexing XML-files (via -S prog parameter) like this one:
--------------- snip -----------------------------
Path-Name: /tvtitel/287358
Content-Length: 208
Last-Mtime: 1031911762
Document-Type: XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<xml>
<titel>L'instit : Le choix de Théo</titel><desc>francetélévision (France2)|1
Boulevard Victor, Immeuble Le Barjac|F75015|Paris|11-09-02
21:10||||</desc></xml>
--------------- snip -----------------------------
In the Config File I use the "TranslateCharacters :ascii7:" Parameter which
should index "Théo" as "Theo" (as I understood with this feature only the
Index is converted, not the actual text) so that i could search for "theo"
and find the above document.
When printing the keywords (with -k '*') I can't find the word "theo":
--------------- snip -----------------------------
... thac thaco the ti ticket ...
--------------- snip -----------------------------
When im Searching for "theo" i get no results, when searching for "th*" I get
this result:
--------------- snip -----------------------------
# SWISH format: 2.1-dev-26
# Search words: titel=(th*)
# Number of hits: 4
# Search time: 0.001 seconds
# Run time: 0.038 seconds
L'instit : Le choix de Théo
The Brian Benben Show
Thaïlande
Thé ou café
--------------- snip -----------------------------
For me It looks like that the conversion from UTF-8, that is used internally
by the libxml, back to ISO-8859-1 for the indexer doesn't work. But there is
no error report when indexing.
Any Ideas?
thanks,
thomas
Received on Fri Sep 13 10:23:56 2002