On Sun, Oct 26, 2003 at 09:46:14AM -0800, J Robinson wrote:
> It seems that Korean, japanese, and other asian pages
> are especially likely to cause the error (no surprise
> there). I found some publicly available examples:
>
> http://www.openbsd.com/ko/donations.html
> input conversion failed due to input error
> Bytes: 0xB8 0x00 0x20 0xBE
>
> But even some 'english' pages exhibit the error:
>
> http://www.gnu.org/testimonials/supported.html
> input conversion failed due to input error
> Bytes: 0xC4 0x3C 0x2F 0x41
moseley@bumby:~$ od -t x1 supported.html | grep -i c4
moseley@bumby:~$
What version of libxm2 do you have? I don't see that error.
moseley@bumby:~$ xml2-config --version
2.5.11
I don't get the errors even with
http://www.openbsd.com/ko/donations.html. If I set ParserWarnLevel I do
get a lot of
warning: Failed to convert internal UTF-8 to Latin-1.
Replacing non ISO-8859-1 char with char ' '
> Any ideas on the best way to detect and ignore
> multi-byte content?
Libxml2 is suppose to detect the encoding and convert to UTF-8
internally.
--
Bill Moseley
moseley@hank.org
Received on Sun Oct 26 18:26:33 2003