Bill Moseley wrote on 5/12/09 9:38 PM:
> On Tue, May 12, 2009 at 07:05:01PM +0300, Fatih Aytaç wrote:
>> I am indexing iso-8859-9 encoded html files. Swish-e can make searches with
>> iso-8859-9 special chars. But cannot match the lowercase letters with the
>> uppercase ones.
>> The search of "ALL","all","All" words gives me the same results. But search
>> of "ALİ","ali" words gives different result. The lowercase of "İ" is "i" in
>> Turkish. How can I able to make correct lowercase/uppercase match of Turkish
>> characters so that swish-e gives the same results for the words "ALİ" and
>> "ali".
>
> Swish uses tolower(), IIRC, which should respect locale settings.
> Have you tried setting your locale? I suspect you would want to do
> that when indexing.
>
Bill is correct. strtolower() in swstring.c is the function. Here's a little
example code to show:
---------------------snip----------------------------
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
/* strtolower() from swstring.c in swish-e 2.4.x */
char *
strtolower(char *s)
{
unsigned char *p = (unsigned char *) s;
while (*p)
{
*p = tolower((unsigned char) *p);
p++;
}
return s;
}
int
main( int argc, char *argv[] )
{
int i;
char *str;
char *loc;
loc = setlocale(LC_CTYPE, "");
printf("locale = %s\n", loc);
for(i=1; i<argc; i++) {
str = argv[i];
printf("%s -> ", str);
printf("%s\n", strtolower(str));
}
}
---------------------snip----------------------------
[karpet@ira:~/tmp]$ LC_ALL=tr_TR.ISO8859-9 ./strtolower ÏIAÀÁÂ
locale = tr_TR.ISO8859-9
ÏIAÀÁÂ -> ïıaàáâ
[karpet@ira:~/tmp]$ LC_ALL=en_US.ISO8859-1 ./strtolower ÏIAÀÁÂ
locale = en_US.ISO8859-1
ÏIAÀÁÂ -> ïiaàáâ
note that setting the LC_CTYPE is recommended over LC_ALL[0].
[0]http://mail.nl.linux.org/linux-utf8/2001-09/msg00030.html
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed May 13 00:23:34 2009