Bill,
thank you for your fast help.
Am Mon, Jan 31, 2005 at 09:25:58AM -0800 schrieb Bill Moseley:
> > we are using swish-e 2.4.3.
> > We feed swish-e by putting XML files into it.
> > One xml tag which is used for sorting the search result
> > contains german umlauts.
>
> Properties are pre-sorted at indexing time. The function that does
> this is called Compare_Properties() in docprop.c. For string
> properties flagged as "case:compare" it uses the library function
> strncmp(), which does not take LC_COLLATE into consideration. For
> strings marked as "case:ignore" it uses strncasecmp() which does check
> LC_COLLATE.
> If your property is flagged as case:ignore then check your locale
> (LC_COLLATE) setting.
> There's a strcoll() function to replace strcmp(), but the code would
> need to be rewritten since the strings are not null terminated.
>
> You can check your property's case setting by running
> swish-e -f myindex -T index_metanames
> Use PropertyNamesIgnoreCase to set properties to ignore case.
We are using the default "case:ignore" for properties.
We checked the implementation of strncasecmp (see below).
This function does not take into consideration the value
of LC_COLLATE (under SuSE Linux 9.x).
Would it be possible for you or other swish-e developers to
change the swish-e source so that it will use strcoll?
We need correctly sorted results.
Right now we get A..O..U..ZÄÖÜ instead of AÄ..OÖ..UÜ..Z.
Thanks a lot in advance, Uwe Dierolf
--------------------------------------------------------------------------
Uwe Dierolf Tel 0721/608-6076
University Library of Karlsruhe Fax 0721/608-4886
Straße am Forum 76049 Karlsruhe / Germany
--------------------------------------------------------------------------
#include <string.h>
#include <stdio.h>
#include <locale.h>
int main(int argc, char **argv) {
setlocale(LC_COLLATE, "de_DE");
if (argc != 3) {
puts("benötige 2 Argumente");
return -1;
} else {
printf("strcasecmp: %d\n"
" strcmp: %d\n"
" strcoll: %d\n",
strcasecmp(argv[1], argv[2]),
strcmp(argv[1], argv[2]),
strcoll(argv[1], argv[2]));
}
return 0;
}
sortlocale a b sortlocale ä b
strcasecmp: -1 strcasecmp: 130
strcmp: -1 strcmp: 130
strcoll: -1 strcoll: -1
sortlocale a A sortlocale u ü
strcasecmp: 0 strcasecmp: -135
strcmp: 32 strcmp: -135
strcoll: -2 strcoll: -9
sortlocale a ä sortlocale ü v
strcasecmp: -131 strcasecmp: 134
strcmp: -131 strcmp: 134
strcoll: -9 strcoll: -1
Received on Wed Feb 2 06:46:40 2005