Skip to main content.
home | support | download

Back to List Archive

Re: input conversion failed

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sun Oct 26 2003 - 18:14:18 GMT
On Sun, Oct 26, 2003 at 09:46:14AM -0800, J Robinson wrote:
> It seems that Korean, japanese, and other asian pages
> are especially likely to cause the error (no surprise
> there). I found some publicly available examples:
> 
> http://www.openbsd.com/ko/donations.html
> input conversion failed due to input error
> Bytes: 0xB8 0x00 0x20 0xBE
> 
> But even some 'english' pages exhibit the error:
> 
> http://www.gnu.org/testimonials/supported.html
> input conversion failed due to input error
> Bytes: 0xC4 0x3C 0x2F 0x41

moseley@bumby:~$ od -t x1 supported.html  | grep -i c4
moseley@bumby:~$

What version of libxm2 do you have?  I don't see that error.

moseley@bumby:~$ xml2-config --version
2.5.11

I don't get the errors even with
http://www.openbsd.com/ko/donations.html.  If I set ParserWarnLevel I do
get a lot of 

 warning: Failed to convert internal UTF-8 to Latin-1.
 Replacing non ISO-8859-1 char with char ' '

> Any ideas on the best way to detect and ignore
> multi-byte content?

Libxml2 is suppose to detect the encoding and convert to UTF-8
internally.


-- 
Bill Moseley
moseley@hank.org
Received on Sun Oct 26 18:26:33 2003