loading...

Discussion on: Why No Modern Programming Language Should Have a 'Character' Data Type

Collapse
brandelune profile image
Jean-Christophe Helary

You wrote:
"This 2-byte, fixed-width encoding, "UTF-16""
But UTF-16 is "encoded with one or two 16-bit code units" (cf Wikipedia), hence it is a variable length 2 or 4 bytes encoding.
UTF-32 is a fixed-width encoding
Also, UTF-8 can be 1 to 4 bytes, the last 4th byte represents code points U+10000 to U+10FFFF.

Collapse
awwsmm profile image
Andrew (he/him) Author

You're right. When UTF-16 was introduced, it was fixed-width. But -- to accommodate 4-byte-width characters -- it's now a variable-width encoding. I'll edit the text to clarify that. Thanks!