Skip to main content.
home | support | download

Back to List Archive

Re: problem preserving specific special characters/unicodes

From: Brad Miele <brad(at)not-real.auroraquanta.com>
Date: Thu Mar 11 2004 - 13:55:32 GMT
My workaround for these issues, which may or may not be possible in your
case, is to use swish-e to get the list of records that I want to return,
but then lookup the content from those records via the database. I only
have to do it for certain indexes, and even though it slows down the
return, it makes all of the characters display nicely.

Brad
------------------------------------------------------------
 Brad Miele
 Technology Director
 AuroraPhotos.com
 (207) 828-8787 x110
 bmiele@auroraphotos.com

 I brake for chezlogs!


On Thu, 11 Mar 2004, Bill Moseley wrote:

>
> On Wed, Mar 10, 2004 at 11:33:52PM -0800, Prashant Badhe wrote:
> > Hi,
> >
> >     Can anybody give some idea about how to preserve some specific
> > characters such as copy right symbol, endash, emdash, smart quotes etc.
> > that are appearing in our input XML files??
>
> Libxml2 converts to utf-8.  (Entities are also converted by libxml2.)
> Swish-e is only 8-bit so it has to convert utf-8 to an 8-bit encoding,
> which is currently hard-coded to 8859-1.  Characters that can't make
> that conversion are lost.
>
> My guess is your source is encoded in Windows 1252 which contain
> characters that do not map to 8859-1.  I thought copyright was ok,
> though.  Trademark, will not convert, though.
>
>
> --
> Bill Moseley
> moseley@hank.org
>
>
Received on Thu Mar 11 05:55:32 2004