Skip to main content.
home | support | download

Back to List Archive

Re: Fw: Re: 8-bit chars

From: John Angel <angel_john(at)not-real.hotmail.com>
Date: Thu Dec 11 2003 - 20:40:29 GMT
> > Is it possible to use iconv(charset_of_the_document_being_indexed,
utf-8)
> > instead of UTF8Toisolat1()?
>
> You mean convert from libxml2's internal utf-8 back to the encoding of
> the original document?  Probably -- I assume there's some way to have
> libxml2 tell you what it was encoding from.

Yes, it would be great.


> But that would not work if you have documents of different encodings.
> The index itself has to be one encoding.  That's why I was saying that
> iconv could be used with a configuration setting to say what 8-bit
> encoding to use.

Why it wouldn't work with different encodings? It would work just like as it
was indexed with HTML parser?


> > > What tolower does depends on the tolower
> > > function swish-e was linked with.
> >
> > setlocale(charset_of_the_document_being_indexed) on-the-fly?
>
> Well, you want tolower to work for the encoding that the index is
> encoded in.

Of course. So, is this possible?
Received on Thu Dec 11 20:41:06 2003