On Mon, 23 Dec 2002, Bojan Stefancic wrote:
> Newbie question:
> Does Swish-e already converts UTF-8 into iso-8859-2 ?
No, that "soon" below has not happened yet.
One fix is to modify parser.c and replace the code that converts to
8859-1 with code that uses iconv and then have a configuration directive
to say what you want to encode into. It would still need to be an 8-bit
character set because all of swish-e's searching and sorting is 8bit.
The better fix is to store everything as UTF-8 internally and replace all
the code that looks at characters.
I'm waiting for a big block of time to open up so I can attempt the UTF-8
change, or for someone with some experience with UTF-8 or wide characters
to volunteer to help. I'm not sure which of those is more likely to
> Bugs Report (http://swish-e.org/current/docs/SWISH-BUGS.html) states:
> "The XML2 & HTML2 parsers (Libxml2) converts characters from UTF-8 to 8859-1
> encodings before indexing and writing properties. Indexing non-8859-1 data
> may result in invalid character mappings.
> These issues will be resolved soon. "
Bill Moseley email@example.com
Received on Mon Dec 23 12:11:48 2002