Roman Chyla scribbled on 3/31/05 7:02 AM:
>>By the way, could any in the list recommend any UTF 8 capable indexing tool?
>
>
> Lucene
Lucene does UTF.
> I believe that htdig can index u8 too
htDig does not:
4.27. How can I get htdig to index Chinese, Japanese or Korean text?
You can't do that yet. Current versions of ht://Dig only support 8-bit
characters, so languages such as Chinese, Japanese and Korean, which require
16-bit characters, are not currently supported. The same goes for documents in
any language if the document is encoded in anything but simple 8-bit character
sets. Unicode and UTF-8 documents are not supported. There are long-range plans
to add support for these, but it's a huge task that no developer has taken up yet.
http://www.htdig.org/FAQ.html#q4.27
That last sentence sounds just like Swish-e...
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Thu Mar 31 07:05:36 2005