DEV Community

Discussion on: A brief guide to perl character encoding

 
fgasper profile image
Felipe Gasper

I’m still wondering why it would matter whether someone got the data into internal-UTF8 or internal-bytes. Are you unable to use unicode_strings?

Thread Thread
 
drhyde profile image
David Cantrell

unicode_strings won't help when the problem is "print does weird stuff" because the data is already broken by the time my code gets it.

Thread Thread
 
fgasper profile image
Felipe Gasper • Edited

But print doesn’t care what the string internals are …

> perl -C -e'my $foo = "\xe9"; print "$foo\n"'
é

> perl -C -e'my $foo = "\xe9"; utf8::upgrade($foo); print "$foo\n"'
é
Enter fullscreen mode Exit fullscreen mode

This article refers folks to Perl internals but doesn’t describe when it is (and isn’t) useful to look at them. In a lot of cases it’s a red herring that can reinforce incorrect mental models about how all of this works.