Skip to main content.
home | support | download

Back to List Archive

XML2 parser error?

From: Jonas Wolf <JOWOLF(at)not-real.uk.ibm.com>
Date: Wed Jul 28 2004 - 16:10:39 GMT
I use the prog option to generate XML documents to be indexed, using the 
XML2 parser. To make sure that the XML2 parser does not break, I do 
HTML::Entities::encode_entities on the text that i enclose in xml tags. I 
discovered that some of the documents that I index contain "strange" ASCII 
characters including control characters. Encode_entities transforms these 
to something like &#8; which is valid XML syntax. For example, 
<xml>&#64;</xml> is valid XML and just contains the character @. Swish-e 
(or probably the XML2 parser) breaks down when it encounters this 
character sequence, even though it is perfectly legal. This is not a big 
problem for me, as I just filter these out afterwards. But in general, 
this could be considered a bug.

Any takers?

Jonas
Received on Wed Jul 28 09:11:11 2004