Skip to main content.
home | support | download

Back to List Archive

Re: Using ISO-8859-2

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Dec 23 2002 - 12:11:34 GMT
On Mon, 23 Dec 2002, Bojan Stefancic wrote:

> Newbie question:
> 
> Does Swish-e already converts UTF-8 into iso-8859-2 ?

No, that "soon" below has not happened yet.

One fix is to modify parser.c and replace the code that converts to
8859-1 with code that uses iconv and then have a configuration directive
to say what you want to encode into.  It would still need to be an 8-bit
character set because all of swish-e's searching and sorting is 8bit.

The better fix is to store everything as UTF-8 internally and replace all
the code that looks at characters.

I'm waiting for a big block of time to open up so I can attempt the UTF-8
change, or for someone with some experience with UTF-8 or wide characters
to volunteer to help.  I'm not sure which of those is more likely to
happend.

> 
> Bugs Report (http://swish-e.org/current/docs/SWISH-BUGS.html) states: 
> "The XML2 & HTML2 parsers (Libxml2) converts characters from UTF-8 to 8859-1 
> encodings before indexing and writing properties. Indexing non-8859-1 data 
> may result in invalid character mappings. 
> These issues will be resolved soon. "
> 
> Regards,
> Bojan
> 
> 

-- 
Bill Moseley moseley@hank.org
Received on Mon Dec 23 12:11:48 2002