I did some more testing, and indeed swish-e is doing everything correctly.
The XML parser recognises @ sequences fine, but breaks down on
characters below 32, such as , which is also correct behaviour. (As a
side note, this never generates an error message, it just stops indexing
the document at that point - Can you force error messages?). The problem
seems to be HTML::Entities::encode_entities, which generates these invalid
character sequences.
I might mention this elsewhere...
Jonas
Received on Thu Jul 29 02:23:06 2004