Skip to main content.
home | support | download

Back to List Archive

Re: XML2 parser error?

From: Peter Karman <karman(at)not-real.cray.com>
Date: Fri Jul 30 2004 - 15:17:01 GMT
 From the HTML::Entities doc:

encode_entities( $string, $unsafe_chars )

This routine replaces unsafe characters in $string with their entity
representation. A second argument can be given to specify which
characters to consider unsafe (i.e., which to escape). The default set
of characters to encode are control chars, high-bit chars, and the <&>.


So why not pass this, Jonas:

my @unsafe = 0 .. 32;	# or whatever ascii you DON'T want encoded
my $unsafe;
$unsafe .= chr for @unsafe;

$contents = "<xml>\n" . HTML::Entities::encode_entities($contents, 
"[^$unsafe]") .
"\n</xml>";

Keith Ivey wrote on 7/30/04 9:53 AM:
> Jonas Wolf wrote:
> 
> 
>>As you can see, the &#n; sequences with n<32 break libxml2, and rightly 
>>so. HTML::Entitites should not generate these codes, as they are not valid 
>>HTML or XML.
>> 
>>
> 
>  From what I can tell, it appears that three points below 32 -- &#9;, 
> &#10;, and &#13;  -- are legal.
> But what should HTML::Entities do when presented with something that 
> can't be represented?
> die?
> 

-- 
Peter Karman - Software Publications Engineer - Cray Inc
phone: 651-605-9009 - mailto:karman@cray.com
Received on Fri Jul 30 08:17:13 2004