Rainer Hofmann scribbled on 4/4/07 6:47 AM:
> following situation gives me an headache:
> Windows clients (Lang=de, CP1252) put PDF-Files onto a fileserver (Linux
> Lang=en_us.utf) via Samba.
> Those files are periodically indexed by swish-e located on the server.
> Works pretty well so far.
> But if clients use non ASCII-characters like öäüßÄÖÜ in their file or
> directory names they run into trouble, when searching these files.
sounds like a messy encoding problem. I assume Windows doesn't use UTF-8 for its
filesystem, and Swish-e converts UTF-8 to Latin1 (ISO-8859-1) where possible.
And who knows what samba does wrt to converting (or not) filenames from windows
fs encoding to the destination Linux fs encoding.
might address part of your issue.
The ideal is to do everything in UTF-8, since it has code points for all
characters and is ASCII compatible. But (as is oft repeated here) Swish-e
doesn't yet handle UTF-8 well. In the meantime, I'd suggest standardizing on
Latin1, since that seems like the least evil compromise. Convert your filenames
with convmv to Latin1, then index with Swish-e, and then your GUI will need to
map between Latin1 and the windows encoding (CP1252?) if retrieving from the
Windows fs (instead of from Samba).
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Users mailing list
Received on Wed Apr 4 09:13:28 2007