On Wed, Mar 23, 2005 at 10:28:38AM -0800, Carmelo Carchedi wrote:
> I have a tipical xml file like this in utf-8:
> maybe the problem is "accented characters".
> If I have accented characters in <testomassima> tag, i cannot find
> any word (with or without accent) in the xml file.
> is correct to index utf8 files?
It's fine. In fact all documents parsed by libxml2 are in utf8
internally and then converted to 8-bit encoding (namely 8859-1) at
The trick to debugging is index a single file:
swish-e -i test.xml -c swish.config -T indexed_words
That -T indexed_words option will have swish display all the words
that are indexed. Those are the words that you can search for. Make
sure that the entire document is being indexed -- there are cases
where bad XML will make libxml2 abort processing in the middle of a
Then when searching do:
swish-e -w foo -H9 | grep Parsed
and that will show you the word(s) swish is searching for in the
The other thing is set ParserWarnLevel 9 in your config file so that
libxml2 will report any errors in processing.
> it's better to convert utf-8 file in other charset?
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Received on Wed Mar 23 10:41:54 2005