Hi Thomas,
Yep, you are right.
I had also this problem some weeks ago.
I am not 100% sure but the expat library seems to ignore the ISO
header.
Use libxml2 instead (XML2) and this problem will be gone.
By te way, for spanish I use:
TranslateCharacters ÁáÉéÍíÓóÚú aaeeiioouu
This will index "José" as "Jose"
cu
Jose
On 13 Sep 2002, at 3:20, Thomas Seifert wrote:
> Hi,
>
> i play around with Swish-E and french texts the last few days and i've
> encountered a problem that i can't solve.
>
> I'm indexing XML-files (via -S prog parameter) like this one:
> --------------- snip -----------------------------
> Path-Name: /tvtitel/287358
> Content-Length: 208
> Last-Mtime: 1031911762
> Document-Type: XML
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <xml>
> <titel>L'instit : Le choix de Théo</titel><desc>francetélévision
> (France2)|1 Boulevard Victor, Immeuble Le Barjac|F75015|Paris|11-09-02
> 21:10||||</desc></xml> --------------- snip
> -----------------------------
>
> In the Config File I use the "TranslateCharacters :ascii7:" Parameter
> which should index "Théo" as "Theo" (as I understood with this feature
> only the Index is converted, not the actual text) so that i could
> search for "theo" and find the above document.
>
> When printing the keywords (with -k '*') I can't find the word "theo":
> --------------- snip ----------------------------- ... thac thaco the
> ti ticket ... --------------- snip -----------------------------
>
> When im Searching for "theo" i get no results, when searching for
> "th*" I get this result: --------------- snip
> ----------------------------- # SWISH format: 2.1-dev-26 # Search
> words: titel=(th*) # Number of hits: 4 # Search time: 0.001 seconds #
> Run time: 0.038 seconds L'instit : Le choix de Théo The Brian Benben
> Show Thaïlande Thé ou café --------------- snip
> -----------------------------
>
> For me It looks like that the conversion from UTF-8, that is used
> internally by the libxml, back to ISO-8859-1 for the indexer doesn't
> work. But there is no error report when indexing.
>
> Any Ideas?
>
> thanks,
> thomas
>
Received on Fri Sep 13 15:19:03 2002