Skip to main content.
home | support | download

Back to List Archive

Re: non-English charaters in XML files

From: Bill Moseley <moseley(at)>
Date: Mon Nov 01 2004 - 15:10:42 GMT
On Mon, Nov 01, 2004 at 03:40:06AM -0800, wrote:
> My .conf file looks like this: 
> UndefinedXMLAttributes auto 
> UndefinedMetaTags auto 

Sure you want to do that?  Seems like you will be creating a lot of

To find out why you are getting no resuts first use:

  swish-e -c config -i test.html test.xml -T indexed_words

and you will notice something odd.  Indexing stops in the middle of
the XML file.

Then to find out why the parser stopped processing the file turn on:

  ParserWarnLevel 9

in your config file.

1.xml:10: error: Input is not proper UTF-8, indicate encoding !
        <asignatura nombre="Diseo de bases de datos" codigo="4">
1.xml:10: error: Bytes: 0xF1 0x6F 0x20 0x64

Seems like the parser thinks you are using UTF-8

> WordCharacters 0123456789abcdefghijklmnopqrstuvwxyz 

You may not want to modify WordCharacters quite yet.

Bill Moseley

Unsubscribe from or help with the swish-e list:

Help with Swish-e:
Received on Mon Nov 1 07:10:42 2004