Skip to main content.
home | support | download

Back to List Archive

Re: Detecting multibyte/wide characters?

From: Bill Moseley <moseley(at)>
Date: Sun Oct 03 2004 - 19:37:55 GMT
On Sun, Oct 03, 2004 at 09:41:03AM -0700, J Robinson wrote:
> Suppose I have word in a perl scalar ($w). 
> How can I detect if $w contains multibyte or 'wide'
> characters?

You can test if the UTF flag is set on the scalar by:

    utf8::is_utf8( $w );


    moseley@bumby:~$ perl    -lwe '$x = chr(250); print utf8::is_utf8($x)'

    moseley@bumby:~$ perl    -lwe '$x = chr(450); print utf8::is_utf8($x)'

But that doesn't really tell you anything about the data in the
scalar.  The UTF flag can be set when the scalar only contains ascii
values, too.

I think you would need to look at the bytes in the string.  Assuming
you are asking if the string contains chars other than some charset
like latin-1 then maybe try converting the string to latin-1 and see
if there's any errors.

perldoc perluniintro has a FAQ section that might be helpful.

Bill Moseley

Unsubscribe from or help with the swish-e list:

Help with Swish-e:
Received on Sun Oct 3 12:38:15 2004