On Fri, 13 Sep 2002, Thomas Seifert wrote:
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <xml>
> <titel>L'instit : Le choix de Théo</titel><desc>francetélévision (France2)|1
> Boulevard Victor, Immeuble Le Barjac|F75015|Paris|11-09-02
> 21:10||||</desc></xml>
> In the Config File I use the "TranslateCharacters :ascii7:" Parameter which
> should index "Théo" as "Theo" (as I understood with this feature only the
> Index is converted, not the actual text) so that i could search for "theo"
> and find the above document.
Can you switch to using the XML2 parser? This is a problem in the swish-e
xml parser -- it's not converting the UTF-8 back to an eight-bit only char
set.
The plan is to add iconv to the parsers in 2.3 development, but currently
only the XML2 parser converts UTF-8 back to 8859-1.
Here's another problem -- which might be a bug.
In html if you say <em><strong>H</strong>ello</em> you would expect
"Hello" to be indexed as a single word. But that
means your XML will end up with the word: 'theofrancetelevision' which is
probably not what you want.
If <titel> or <desc> are metanames or properties then that won't happen.
--
Bill Moseley moseley@hank.org
Received on Fri Sep 13 13:31:12 2002