On 11/09/2007 03:51 PM, Bill Moseley wrote:
> On Thu, Nov 08, 2007 at 10:24:29PM -0600, Peter Karman wrote:
>>> I'm new to Swish-e and I think it's a great tool. Unfortunately I ran into a
>>> little problem. I indexed a collection of xml files which are encoded in
>>> Windows-1251. Then I wrote a small cgi script and I started sending queries to
>>> Swish-e. All was great except one thing. A pretty normal Windows-1251
>>> character '?' is considered by Swish-e for a word delimiter but it's not. I'll
>>> appreciate any help.
>>> I'm on Windows XP SP 2 a I have installed Swish-e 2.4.5.
>>> Best regards,
>>> Nikola
>>>
>> You likely need to adjust your WordCharacters setting to include the relevant
>> 1251 characters. By default is it Latin1 (iso-8859-1).
>
> Peter, is there a way to tell libxml2 that the content is 1251?
>
iirc, libxml2 looks at the content-type header if it is html. If xml, it uses
the <?xml ...?> content declaration.
fwiw, libswish3 checks the LANG and LC_CTYPE env vars and falls back on that if
the encoding is not declared in the document. libxml2 doesn't do that for you,
iirc.
--
Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Nov 9 17:05:14 2007