Re: [swish-e] Problem with filenames/character sets

From: Bill Moseley <moseley(at)>
Date: Wed Apr 04 2007 - 13:02:49 GMT
On Wed, Apr 04, 2007 at 01:47:20PM +0200, Rainer Hofmann wrote:
> Hi,
> following situation gives me an headache:
> Windows clients (Lang=de, CP1252) put PDF-Files onto a fileserver (Linux 
> Lang=en_us.utf) via Samba.
> Those files are periodically indexed by swish-e located on the server. 
> Works pretty well so far.
> But if  clients use non ASCII-characters like  in their file or 
> directory names they run into trouble, when searching these files.
> There is a swish-e installed on each windows client. A little GUI is 
> used to do the query. Filenames are retrieved, but show wrong characters 
> in their names. If users want to open such a file it results in "file 
> not found".

You have too many steps there to say where it's failing.  Swish can
handle 8-bit characters in the file name only (the file names are not
passed through libxml2 like the content would typically be).

But the above listed chars would not be a problem:

moseley@bumby:~$ echo "hello"> .txt
moseley@bumby:~$ swish-e -i .txt -v0
moseley@bumby:~$ swish-e -w hello -H0
1000 .txt ".txt" 6

So if your GUI is showing the wrong chars then maybe it is not
decoding the output from swish into the charset your GUI is using.

If you look at the file names on the server do they look ok?  What I'm
wondering about is how Samba deals with different locales on different

Bill Moseley

Users mailing list
