-------- Original Message --------
Subject: Re: [SWISH-E] Re: libxml2 and non-ascii?
Date: Mon, 22 Nov 2004 11:51:15 +0100
From: Roman Chyla <email@example.com>
thank you for the link - I played with configuration, but I am afraid
the hints from FAQ can't solve my problem in Windows-1250, nor in
Iso-8859-2 encoding when using libxml2 parser.
I tried also "TranslateCharacters" option, but since the UTF is 16 bit I
can not map it to 8bit characters (did I miss something?)
perhaps, there could be a new TranslateCharactersUTF directive for users
with libxml2 and non-8859-2 characters in docs?
Bill Moseley wrote:
> On Fri, Nov 19, 2004 at 06:41:31AM -0800, Roman Chyla wrote:
>>I have noticed, that when I use libxml2 on my indexed files, special
>>characters are stripped off (in my case czech characters)
> Let us know if this doesn't answer your question:
>>Switching to DefaultContents HTML solved that problem - (together with
> The HTML parser is old and broken. But it knows nothing of encodings
> so it will just index 8-bit chars regardless of what they are. But
> that parser does make more mistakes than the libxml2 parser and many
> features are not supported in HTML that are in the HTML2 parser.
Received on Mon Nov 22 04:14:29 2004