Skip to main content.
home | support | download

Back to List Archive

Re: Detecting multibyte/wide characters?

From: Bill Moseley <moseley(at)>
Date: Tue Feb 22 2005 - 14:39:34 GMT
On Tue, Feb 22, 2005 at 06:11:38AM -0800, J Robinson wrote:
>  input conversion failed due to input error
>  Bytes: 0xB5 0x74 0xA3 0xBA

That's from libxml2.  I suspect libxml2 aborts processing the input,
and I imagine that happens before any text is passed to swish, but may
depened on where that happens in the input doc.

You would need to test a few different docs to find out.

Did you look at that document and see why and where that is happening?
Is your input file utf8?  Is that really an invalid character sequence?

I'm not sure if that error is printed by the parser.c error handler or
not.  If so you might be able to catch that error and abort processing.

As for detecting in in advance, I guess you would have to somehow
validate the doc before processin.

Bill Moseley
Received on Tue Feb 22 06:39:35 2005