On Tue, Nov 09, 2004 at 10:19:05AM -0800, firstname.lastname@example.org wrote:
> > On Tue, Nov 09, 2004 at 06:52:24AM -0800, email@example.com wrote:
> > > Swish-e splits the words in ISO-8859. I like the way that works
> > > the UTF-8.
> > So I guess that means your source xml is encoded in UTF-8.
> Yes, but I noticed that my server has files encoded in UTF-8 and
> others in ISO-8859, so I'll have files with ñ's indexed as n and
> others whit the words splitted. Anyone has this problem with the xml
> files? How do you resolve it and index your XML files? Don't know
> what to do.
You might review http://xmlsoft.org/encoding.html ("How is it
implemented?" section). This part seems to be related to this
If there is no encoding declaration, then the input has to be in
either UTF-8 or UTF-16, if it is not then at some point when
processing the input, the converter/checker of UTF-8 form will
raise an encoding error. You may end-up with a garbled document,
or no document at all !
You may need to make sure you xml is well-formed and has the encoding
specified. You might be able to automate that process (maybe the
file(1) command can help figure out the encoding).
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Received on Tue Nov 9 10:44:45 2004