Skip to main content.
home | support | download

Back to List Archive

Re: non-English charaters in XML files

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Nov 01 2004 - 15:10:42 GMT
On Mon, Nov 01, 2004 at 03:40:06AM -0800, dasoso@alumni.uv.es wrote:
> My .conf file looks like this: 
>  
> UndefinedXMLAttributes auto 
> UndefinedMetaTags auto 

Sure you want to do that?  Seems like you will be creating a lot of
metanames.

To find out why you are getting no resuts first use:

  swish-e -c config -i test.html test.xml -T indexed_words

and you will notice something odd.  Indexing stops in the middle of
the XML file.

Then to find out why the parser stopped processing the file turn on:

  ParserWarnLevel 9

in your config file.

1.xml:10: error: Input is not proper UTF-8, indicate encoding !
        <asignatura nombre="Diseo de bases de datos" codigo="4">
                                ^
1.xml:10: error: Bytes: 0xF1 0x6F 0x20 0x64

Seems like the parser thinks you are using UTF-8


> WordCharacters 0123456789abcdefghijklmnopqrstuvwxyz 

You may not want to modify WordCharacters quite yet.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Mon Nov 1 07:10:42 2004