Skip to main content.
home | support | download

Back to List Archive

Re: libxml2 and non-ascii?

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Nov 19 2004 - 14:56:52 GMT
On Fri, Nov 19, 2004 at 06:41:31AM -0800, Roman Chyla wrote:
> Hi,
> 
> I have noticed, that when I use libxml2 on my indexed files, special 
> characters are stripped off (in my case czech characters)

Let us know if this doesn't answer your question:

http://swish-e.org/current/docs/SWISH-FAQ.html#How_do_I_index_non_English_words_

> Switching to DefaultContents HTML solved that problem - (together with 
> TranslateCharacters directive)

The HTML parser is old and broken.  But it knows nothing of encodings
so it will just index 8-bit chars regardless of what they are.  But
that parser does make more mistakes than the libxml2 parser and many
features are not supported in HTML that are in the HTML2 parser.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Fri Nov 19 06:56:53 2004