I use the prog option to generate XML documents to be indexed, using the
XML2 parser. To make sure that the XML2 parser does not break, I do
HTML::Entities::encode_entities on the text that i enclose in xml tags. I
discovered that some of the documents that I index contain "strange" ASCII
characters including control characters. Encode_entities transforms these
to something like  which is valid XML syntax. For example,
<xml>@</xml> is valid XML and just contains the character @. Swish-e
(or probably the XML2 parser) breaks down when it encounters this
character sequence, even though it is perfectly legal. This is not a big
problem for me, as I just filter these out afterwards. But in general,
this could be considered a bug.
Any takers?
Jonas
Received on Wed Jul 28 09:11:11 2004