Discussion on: Why No Modern Programming Language Should Have a 'Character' Data Type

View post

Replies for: In Swift, a Character is an extended grapheme cluster, which will consist of one-or-more Unicode scalar values. It's what a reader of a string wil...

This is, I think, the compromise that comes closest to making sense. Check out the examples at grapheme-splitter -- I think the resulting graphemes align closely with the intuitive definition of a "character". However, think about how you would access and manipulate these graphemes programmatically: one code point at a type (or even one byte at a time). There's a disconnect between the programmer's understanding of a character and the layperson's understanding of a character. What I'm arguing is that eliminating the term "character" should eliminate that ambiguity.

Eljay-Adobe • May 28 '20

The API in Swift allows getting to a UTF-8 Encoding Unit, or a UTF-16 Encoding Unit, or a UTF-32 Codepoint. Treating them as an index into an array of those sub-Character things. (Depending on what the developer is trying to do.)

Swift and Python 3 both seem to have a good handle on Unicode strings.

Alas, I have to work with C++, which has somewhat underwhelming support for Unicode.