loading...

Discussion on: Why No Modern Programming Language Should Have a 'Character' Data Type

Collapse
awwsmm profile image
Andrew (he/him) Author

This is, I think, the compromise that comes closest to making sense. Check out the examples at grapheme-splitter -- I think the resulting graphemes align closely with the intuitive definition of a "character". However, think about how you would access and manipulate these graphemes programmatically: one code point at a type (or even one byte at a time). There's a disconnect between the programmer's understanding of a character and the layperson's understanding of a character. What I'm arguing is that eliminating the term "character" should eliminate that ambiguity.

Collapse
eljayadobe profile image
Eljay-Adobe

The API in Swift allows getting to a UTF-8 Encoding Unit, or a UTF-16 Encoding Unit, or a UTF-32 Codepoint. Treating them as an index into an array of those sub-Character things. (Depending on what the developer is trying to do.)

Swift and Python 3 both seem to have a good handle on Unicode strings.

Alas, I have to work with C++, which has somewhat underwhelming support for Unicode.