Skip to main content.
home | support | download

Back to List Archive

Re: Detecting multibyte/wide characters?

From: J Robinson <jrobinson852(at)not-real.yahoo.com>
Date: Wed Mar 02 2005 - 13:55:50 GMT
Thanks for the response, Bill.

I'll have to poke around and see if I can't answer
your questions below.

(I sent this to the list last week but it didn't seem
to get out)

jrobinson

> --- Bill Moseley <moseley@hank.org> wrote:
> 
> > On Tue, Feb 22, 2005 at 06:11:38AM -0800, J
> Robinson
> > wrote:
> > >  input conversion failed due to input error
> > >  Bytes: 0xB5 0x74 0xA3 0xBA
> > > 
> > 
> > That's from libxml2.  I suspect libxml2 aborts
> > processing the input,
> > and I imagine that happens before any text is
> passed
> > to swish, but may
> > depened on where that happens in the input doc.
> > 
> > You would need to test a few different docs to
> find
> > out.
> > 
> > Did you look at that document and see why and
> where
> > that is happening?
> > Is your input file utf8?  Is that really an
> invalid
> > character sequence?
> > 
> > I'm not sure if that error is printed by the
> > parser.c error handler or
> > not.  If so you might be able to catch that error
> > and abort processing.
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Received on Wed Mar 2 05:55:54 2005