Skip to main content.
home | support | download

Back to List Archive

alias expansion for german umlauts etc.

From: Stefan Klett <s_teve(at)not-real.atis-stud.uni-karlsruhe.de>
Date: Wed May 25 2005 - 13:30:56 GMT
Hi together,

I'm currently trying to solve the following Problem:

If one starts a search in german (or any other language that uses more 
than the 26 basic characters of the charset)  there exists the 
possibilty that the user enters the umlaut, for example  or its alias 
in "standard-ascii" "oe" - To improve the search quality i think it would 
be very useful to normalize the searched word towards the expanded form ( oe) that 
means to make this form the only form to be used internally by swish-e 
while the user doesn't have to care for which form he/she uses.  

Therefore i have written  a set of functions which read in the plain chars and their 
assumed aliases from the swish.conf (Directive ExpandLocalChar) and build 
a pair of lookuptables - in 
the way that one of them holds the holds a "1" in every place (like usual 
256 (integer) entries over all) which holds the single char version and 
the other lookuptable consisting of 256 char pointers holds the 
corresponding alias string in the field corresponding to the single char 
version. A further function does the actual replacement by allocating the 
missing memory and rewriting the input string to return the "expanded 
chars"-version.

So for so aehm good - but in my first tests i learned that my 
first educated guess to put the functionality in swstring.c 
TranslateCharacters doesn't produce the desired result. 

This is why i'm now writing to this list because i'm begging the more 
experienced swish-e- developers arround for a suggestion for a 
"hook-in-point" for my replacement-function .

Thank you in advance for your time
Stefan

Ps: A further question (mere curiosity): is there practical reason for 
writing single char variables as name[1] - i have seen that many times 
while browsing the code - and could not the imagine the idea behind it.
Received on Wed May 25 06:31:11 2005