--- Bill Moseley <moseley@hank.org> wrote:
> On Sun, Oct 26, 2003 at 09:46:14AM -0800, J Robinson
> wrote:
> > It seems that Korean, japanese, and other asian
> pages
> > are especially likely to cause the error (no
> surprise
> > there). I found some publicly available examples:
> >
> > http://www.openbsd.com/ko/donations.html
> > input conversion failed due to input error
> > Bytes: 0xB8 0x00 0x20 0xBE
> >
> > But even some 'english' pages exhibit the error:
> >
> > http://www.gnu.org/testimonials/supported.html
> > input conversion failed due to input error
> > Bytes: 0xC4 0x3C 0x2F 0x41
>
> moseley@bumby:~$ od -t x1 supported.html | grep -i
> c4
> moseley@bumby:~$
>
> What version of libxm2 do you have? I don't see
> that error.
>
> moseley@bumby:~$ xml2-config --version
> 2.5.11
>
> I don't get the errors even with
> http://www.openbsd.com/ko/donations.html. If I set
> ParserWarnLevel I do
> get a lot of
>
> warning: Failed to convert internal UTF-8 to
> Latin-1.
> Replacing non ISO-8859-1 char with char ' '
>
I'm using 2.4.28, built from source on RH6.1:
% xml2-config --version
2.4.28
I'll try upgrading libxml2.
Still, it would be cool if SWISH-E did show the URI
with the error message and/or indicate that the error
came from libxml2. (I found that out from googling).
> > Any ideas on the best way to detect and ignore
> > multi-byte content?
>
> Libxml2 is suppose to detect the encoding and
> convert to UTF-8 internally.
Ah. Good to know.
Best,
jrobinson
__________________________________
Do you Yahoo!?
Exclusive Video Premiere - Britney Spears
http://launch.yahoo.com/promos/britneyspears/
Received on Sun Oct 26 23:23:21 2003