On Fri, 2004-02-20 at 22:33, Peter Karman wrote:
> Forgive me if I am misunderstanding. This sounds like a thread that went=20
> by in December 2003. Search the discussion archives.
Lots of good info in that thread, if I recall.
> I believe the end result was that because swish-e currently does not save=
data in UTF-8,=20
> it can't display any of the indexed data in that format. If by "display=20
> the output" you mean the contents of StoreDescription from the swish=20
> index, then that currently can't happen. I don't think the issue is with=20
> the perl locale or encodings settings for the CGI scripts, I think it's=20
> with the data in the index, *as it was indexed*.
HTML2 or XML2 recodes UTF-8 to ISO-8859-1 which will drop characters
that don't map.
HTML or XML may destroy multi-byte characters (since the indexer thinks
each byte is a character). However, single-byte encodings (ISO-8859-*)
should pass through so long as TranslateCharacters doesn't mangle them.
> Tim Freedom supposedly wrote on 2/20/04 1:51 PM:
> > I have lots of files that have both English and Arabic in
> > them (UTF-8), currently I can only index the english parts (again,
> > I'm willing to help with adding UTF-8 abilities :-)
Patches are welcome. :-)
> yet when I display the output it would be nice to default to UTF-8 to see=
both texts.
You mean for the stored description? That may or may not work depending
on how you have SWISH-E configured. I'd suggest testing it to make sure
multibyte characters aren't destroyed.
--=20
David Norris
http://www.webaugur.com/dave/
ICQ - 412039
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Fri Feb 20 14:57:44 2004