Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] utf8 again

From: Brad Miele <brad(at)not-real.rumblefish.com>
Date: Fri Aug 08 2008 - 21:42:30 GMT
not sure if this helps, but what we do is:

##	fix non utf8 stuff
          $res->{$_} =~ s/([^\x00-\x7F])/'&#' . ord($1) . ';'/gse;

## swap out common euro characters to english version search letters

          if ($res->{$_} =~ /\&\#/){
          my $to_eng = $res->{$_};
          $to_eng =~ s/\&\#246\;/o/g;
          $to_eng =~ s/\&\#214\;/O/g;
          $to_eng =~ s/\&\#233\;/e/g;
          $to_eng =~ s/\&\#232\;/e/g;
          $to_eng =~ s/\&\#200\;/E/g;
          $to_eng =~ s/\&\#201\;/E/g;
          $to_eng =~ s/\&\#209\;/N/g;
          $to_eng =~ s/\&\#241\;/n/g;
          $to_eng =~ s/\&\#220\;/U/g;
          $to_eng =~ s/\&\#252\;/u/g;
}

## append to the keywords
$res->{$_} .= " ".$to_eng;

sorry if i am missing your needs entirely.

Brad

--------------------------------------------
Brad Miele
Director of Technology
rumblefish
919 SW Taylor Suite 300
Portland, OR, 97205, Earth
url: http://www.rumblefish.com
email/aim: brad@rumblefish.com
vox: 503-248-0706





On Aug 8, 2008, at 2:31 PM, Michael Peters wrote:

> amscopub-pcshop@yahoo.com wrote:
>> If you are using international characters, why don't you remove the
>> accents instead?
>>
>> For example, change the Spanish "se~nor" to "senor".
>
> That's what I'll do if there is no other option. But that doesn't help
> with things like chinese or hebrew characters. I have to deal with all
> of them, not just modified ascii chars.
>
> -- 
> Michael Peters
> Plus Three, LP
>
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users


_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Aug 8 17:42:32 2008