Hi
Some of the data I am (considering) indexing includes Unicode
(Latin Extended-A) characters.
I've looked at the FAQ and done some tests - it seems that
if those files are in xml or html, libxml2 will convert them to 8859-1
But in my tests, latin character with accents e.g AMACRON (ā)
are not indexed. I was hoping they would be converted
to the plain letter (a) - stripping the accents off would make my
data conveniently searchable.
I note that the FAQ suggests full unicode support is a way off, but
would stripping of Unicode accents be achieved with a
reasonable effort?
By the way, I used swish-e.exe (Windows) compiled from the current
CVS to test this behaviour.
Greg Ford
Received on Tue Aug 12 14:10:27 2003