Skip to main content.
home | support | download

Back to List Archive

Re: Detecting multibyte/wide characters?

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sun Oct 03 2004 - 19:37:55 GMT
On Sun, Oct 03, 2004 at 09:41:03AM -0700, J Robinson wrote:
> Suppose I have word in a perl scalar ($w). 
> 
> How can I detect if $w contains multibyte or 'wide'
> characters?

You can test if the UTF flag is set on the scalar by:

    utf8::is_utf8( $w );

So:

    moseley@bumby:~$ perl    -lwe '$x = chr(250); print utf8::is_utf8($x)'

    moseley@bumby:~$ perl    -lwe '$x = chr(450); print utf8::is_utf8($x)'
    1

But that doesn't really tell you anything about the data in the
scalar.  The UTF flag can be set when the scalar only contains ascii
values, too.

I think you would need to look at the bytes in the string.  Assuming
you are asking if the string contains chars other than some charset
like latin-1 then maybe try converting the string to latin-1 and see
if there's any errors.

perldoc perluniintro has a FAQ section that might be helpful.





-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Sun Oct 3 12:38:15 2004