Skip to main content.
home | support | download

Back to List Archive

Re: libxml2 and non-ascii?]

From: Roman Chyla <chyla(at)>
Date: Mon Nov 22 2004 - 12:14:22 GMT
-------- Original Message --------
Subject: Re: [SWISH-E] Re: libxml2 and non-ascii?
Date: Mon, 22 Nov 2004 11:51:15 +0100
From: Roman Chyla <>
References: <> 


thank you for the link - I played with configuration, but I am afraid
the hints from FAQ can't solve my problem in Windows-1250, nor in
Iso-8859-2 encoding when using libxml2 parser.

I tried also "TranslateCharacters" option, but since the UTF is 16 bit I
can not map it to 8bit characters (did I miss something?)

perhaps, there could be a new TranslateCharactersUTF directive for users
with libxml2 and non-8859-2 characters in docs?

best regards


Bill Moseley wrote:
> On Fri, Nov 19, 2004 at 06:41:31AM -0800, Roman Chyla wrote:
>>I have noticed, that when I use libxml2 on my indexed files, special 
>>characters are stripped off (in my case czech characters)
> Let us know if this doesn't answer your question:
>>Switching to DefaultContents HTML solved that problem - (together with 
>>TranslateCharacters directive)
> The HTML parser is old and broken.  But it knows nothing of encodings
> so it will just index 8-bit chars regardless of what they are.  But
> that parser does make more mistakes than the libxml2 parser and many
> features are not supported in HTML that are in the HTML2 parser.
Received on Mon Nov 22 04:14:29 2004