Skip to main content.
home | support | download

Back to List Archive

Re: TranslateCharacters - clarification required

From: David L Norris <dave(at)not-real.webaugur.com>
Date: Tue Feb 25 2003 - 14:46:07 GMT
On Tue, 2003-02-25 at 03:19, Tref Gare wrote:
> I'm sure swish-e is fully capable of indexing accented characters
> (latin-1) but for some reason my swish-e setup seems to be unable to
> manage it - 

In theory, it should work.

> specifically éè which it indexes as Ú & Þ respectively, then
> displays as box symbols or question marks.

Do you have (or could you put) an example document on the web somewhere
that exhibits this behavior?  If so I'll try to make some sense of it. 
My mail client translates to/from Unicode UTF8; I wouldn't trust an
emailed document to not to have been altered on my end.  The XML parser
doesn't like when I try to parse your example (clearly because my mail
client is translating it).

Also, what are the locale settings on your Solaris and Windows systems? 

> I'd hoped I could get around it with TranslateCharacters set to either
> TranslateCharacters éè ee
> Or 
> TranslateCharacters :ascii7:
> As I thought thay might translate the characters when indexed and then
> display the translated characters when searched.  However I think I've
> misunderstood the effect of TranslateCharacters as this doesn't seem to
> be the result I'm getting (no change to display anomalies).

I presume you've read this:
http://www.swish-e.org/current/docs/SWISH-CONFIG.html#item_TranslateCharacters

I don't think translated characters are stored.  I think it simply
translates those characters during indexing and searching.  But don't
hold me to that.  ;-)  Bill would know for sure.

For example, if you search for 'cinematheque' then 'cinémathèque' would
match and vice versa.

-- 
 David Norris
  http://www.webaugur.com/dave/
  ICQ - 412039
Received on Tue Feb 25 14:46:44 2003