Skip to main content.
home | support | download

Back to List Archive

RE: Is Swish-e Unicode "Aware"

From: <Rainer.Scherg(at)>
Date: Fri Jun 01 2001 - 11:46:30 GMT
Currently swish cannot handle unicode (UTF16) files.
Even if we had done some first modules with unicode support, it still
will take some version until swish can handle unicode.

But this doesn't mean that you cannot index this file.
You need an external filter, which maps these file(s)
   to ISO 8859-1, Ascii, HTML or XML

Use e.g. FileFilter directive or "prog" method to include the filter.
(The latest develop version of swish has an improved filter
 cmdline handling included)

Currently I have no pointer to such filter software at hand.
If you find one, please be so kind to notify us.

cu - rainer

> -----Original Message-----
> From: Steve McMillen []
> Sent: Friday, June 01, 2001 6:11 AM
> To: Multiple recipients of list
> Subject: [SWISH-E] Is Swish-e Unicode "Aware"
> So for not, I went and used the strings function to handle binary
> files.  that is, I added this line to my swish.conf:
>  FileFilter ".xls" strings
> Thought I'd still like to know if this is expected behavior.
> However, now I have run into a new issue.  I have a unicode 
> text file on
> my website and when swish-e tried to index that file, it complained:
> Warning: Possible embedded null in file
> '/www/html/specs/frankblack/SpecSettingsFiles/reference/encodi
> ngsession.txt'
> The file is a Unicode format.  I think UTF16.
> Since I don't have many of these files, I can change the extension but
> it would be nice if swish-e understood Unicode.  I guess I could find
> some Unicode to ascii converter and use that in the FileFilter
> directive...
> I'd be happy to enter a bug report and supply repro (and sample files)
> if needed.
> thx,
> steve
> -----------------------------------------------------------
> This Mail has been checked for Viruses
> Attention: Encrypted Mails can NOT be checked !
> ***
> Diese Mail wurde auf Viren ueberprueft
> Hinweis: Verschluesselte Mails koennen NICHT geprueft werden!
> ------------------------------------------------------------
Received on Fri Jun 1 11:46:41 2001