loading...

Discussion on: Why No Modern Programming Language Should Have a 'Character' Data Type

Collapse
patarapolw profile image
Pacharapol Withayasakpunt

Hmm? What it have to do with combined characters and Zalgo?

Thread Thread
awwsmm profile image
Andrew (he/him) Author

Zalgo is just an extreme form of combining marks / joiner characters. Most people would consider

Ȧ̛ͭ̔̔͑̅̈́̉͂̅̇͟͏̡͍̖̝͓̲̲͎̲̬̰̜̫̳̱̣͉͉̦

...to be a single character that just happens to have a lot of accent marks on it. If you define "character" as "grapheme", this is true. If you define "character" as "Unicode code point", it is not. That single "character" contains 34 UTF-16 elements. Try running this on CodePen and have a look at the console:

let ii = 0
const zalgo = "Ȧ̛ͭ̔̔͑̅̈́̉͂̅̇͟͏̡͍̖̝͓̲̲͎̲̬̰̜̫̳̱̣͉͉̦"


while (true) {
  let code = zalgo.charCodeAt(ii)
  if (Number.isNaN(code)) break
  console.log(`${ii}: ${code}`)
  ii += 1
}

The problem arises because programmers' intuitive understanding of "character" tends to be closer to "code point", while the average person's understanding of "character" tends to be closer to "grapheme".