Bill Moseley writes:
> On Thu, Feb 03, 2005 at 06:31:03PM +0100, Andreas Seltenreich wrote:
>> Sadly, ISO C doesn't know strNcoll. Naively, I'd just copy and
>> zero-terminate the strings and feed them to strcoll, using static
>> memory to make the penalty bearable. But I'm afraid, will this have to
>> be implemented thread safe? Is it okay to introduce a new string
>> properties flag "case:locale" or similar to make it runtime
>> configurable?
> The point of making it configurable so that you can fallback to the
> old strncasecmp() if you don't need it?
Even with LOCALE=C there's still a penalty of using strcoll, so people
that don't need more than US-ASCII should IMHO not be forced to use
the locale-aware functions. The people over on postgresql.org did some
comparisons a while ago:
<http://groups.google.de/groups?selm=Pine.LNX.4.30.0111261852030.612-100000%40peter.localdomain>
(Sorry for the media breach)
> Might be better to figure out where those strings are allocated and
> allocate another byte and make them null-terminated to start with.
Ok, I'm going to spend some time getting myself more familiar with the
code.
> Just one more thing that won't work when we move to utf-8. (how does
> utf-8 sort?? Do some languages sort to the top?)
strcoll works flawlessly with utf-8 locales. Here's an example I ran
in an utf8-xterm (I used "file" to make sure I am actually typing
utf-8):
$ echo ä > /tmp/test
$ file !$
/tmp/test: UTF-8 Unicode text
$ LC_CTYPE=de_DE.utf-8 LC_COLLATE=de_DE.utf-8 ./a.out ä Ä
strcasecmp: 32
strcmp: 32
strcoll: -6
$ LC_CTYPE=de_DE.utf-8 LC_COLLATE=de_DE.utf-8 ./a.out ä b
strcasecmp: 97
strcmp: 97
strcoll: -1
Thanks
Andreas
Received on Thu Feb 3 10:41:01 2005