Yes, use utf8 is IMO a mistake. It’s a premature hack around the more fundamental problem of Perl strings’ not storing their encoded/decoded state. If Perl could adopt a Unicode model that stores that, this all would be dramatically simpler.
Note that most Perl built-ins that talk to the OS leak PV internals, so if your string stores code points [0x80 0x81] in upgraded/wide/UTF8 format internally, you’ll send 4 bytes to the kernel, even though filehandle ops will (correctly) send 2. My (Sys::Binmode)[metacpan.org/pod/Sys::Binmode] CPAN module fixes this; AFAICT all new Perl code should use it.
I suspect a great many Perl applications have no need of decoding/encoding and can safely just regard their inputs & outputs as byte streams. This is our approach at $work, which keeps things simple for us. (Perl’s PV-leak bugs notwithstanding!)
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Nice job. :) A few points in response:
Yes,
use utf8
is IMO a mistake. It’s a premature hack around the more fundamental problem of Perl strings’ not storing their encoded/decoded state. If Perl could adopt a Unicode model that stores that, this all would be dramatically simpler.Note that most Perl built-ins that talk to the OS leak PV internals, so if your string stores code points [0x80 0x81] in upgraded/wide/UTF8 format internally, you’ll send 4 bytes to the kernel, even though filehandle ops will (correctly) send 2. My (Sys::Binmode)[metacpan.org/pod/Sys::Binmode] CPAN module fixes this; AFAICT all new Perl code should use it.
I suspect a great many Perl applications have no need of decoding/encoding and can safely just regard their inputs & outputs as byte streams. This is our approach at $work, which keeps things simple for us. (Perl’s PV-leak bugs notwithstanding!)