Skip to main content.
home | support | download

Back to List Archive

Re: Fw: Re: 8-bit chars

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Dec 11 2003 - 20:29:03 GMT
On Thu, Dec 11, 2003 at 11:23:00AM -0800, John Angel wrote:
> > You are free to modify parser.c to use iconv and covert back to
> > Windows-1250, as I suggested.  But that won't work for everyone else.
> 
> Is it possible to use iconv(charset_of_the_document_being_indexed, utf-8)
> instead of UTF8Toisolat1()?

You mean convert from libxml2's internal utf-8 back to the encoding of
the original document?  Probably -- I assume there's some way to have
libxml2 tell you what it was encoding from.

But that would not work if you have documents of different encodings.
The index itself has to be one encoding.  That's why I was saying that
iconv could be used with a configuration setting to say what 8-bit
encoding to use.

> > What tolower does depends on the tolower
> > function swish-e was linked with.
> 
> setlocale(charset_of_the_document_being_indexed) on-the-fly?

Well, you want tolower to work for the encoding that the index is
encoded in.

-- 
Bill Moseley
moseley@hank.org
Received on Thu Dec 11 20:29:12 2003