Skip to main content.
home | support | download

Back to List Archive

Re: Fw: Re: 8-bit chars

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Dec 11 2003 - 18:18:16 GMT
On Thu, Dec 11, 2003 at 07:09:07AM -0800, John Angel wrote:

> The first thing we should do is provide full 8-bit support. Not 7-bit as it
> is now.

It's not 7 bit.  It's 8859-1 not ASCII.  If you use the HTML parser then 
it's basically 8-bit clean with the exception that tolower is used on 
those 8-bit clean characters.  What tolower does depends on the tolower 
function swish-e was linked with.

> If just one code tweak (iconv instead of UTF8Toisolat1) will give full 8-bit
> support, that should be done immediately.

Who's 8-bit are you talking about.  My 8-bits are 8859-1 and it works 
fine.  You are using a Windows-1250 encoding which has characters that 
do not map to 8859-1.

You are free to modify parser.c to use iconv and covert back to 
Windows-1250, as I suggested.  But that won't work for everyone else.

> Full utf-8 support is not a joke and certainly requires a lot more things to
> do and test. Should be done carefully, step by step.

Yes, we should spend a lot of time on it.


> > What action should swish-e take when converting utf-8 on input and
> > there's a conversion failure?
> 
> Conversion cannot fail, because we fully support utf-8 with search script.
> It will receive input charset as the parameter and convert (or not)
> accordingly. If the input charset is utf-8 - we use iconv() to convert it to
> 8-bit; if input charset is 8-bit - we don't convert chars at all. I hope I
> didn't miss any detail.

I don't follow.  You can't convert utf-8 to "8-bit", you have to convert 
to an encoding like 8859-1 or Windows-1250.  Those are 8-bit encodings 
but, but obviously you can't convert every utf-8 char.

If the index contains words encoded in the 8859-1 character set (or
Windows-1250) and someone submits a query in utf-8 with characters that
don't map to 8859-1 that's a conversion failure.


-- 
Bill Moseley
moseley@hank.org
Received on Thu Dec 11 18:18:26 2003