Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] utf8 again

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Mon Aug 11 2008 - 16:18:09 GMT
On 08/08/2008 04:49 PM, Michael Peters wrote:
> Brad Miele wrote:
>> not sure if this helps, but what we do is:
> 
> Mine is simpler and just 1 line:
> 
> $buffer =~ s/([^\p{IsASCII}])/sprintf('&amp;#x%X;', ord($1))/ge;
> 

I wrote:

http://search.cpan.org/~karman/Search-Tools-0.17/lib/Search/Tools/XML.pm#utf8_safe(_string_)

for just such cases as needing to store UTF-8 encoded text as a Swish-e Property.

I think \p{IsASCII} requires the double encoding of & -> &amp; because \p works on
characters, not bytes. It'll work (the double-encoding approach) just as well as the
Search::Tools hack does, but for different reasons.

-- 
Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Aug 11 12:18:10 2008