DEV Community

Discussion on: Quick and easy way of counting UTF-8 characters in Javascript

Collapse
 
maxart2501 profile image
Massimo Artizzu

That method unfortunately fails for more complex cases, like grapheme clusters, common in Eastern languages or in Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞ memes:

[...'Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞'].length // 75!

But also emojis can be a pain, thanks to the wonderful U+200D ZERO WIDTH JOINER:

[...'👨‍👩‍👧‍👦'].length // 7

If you know Mathias Bynens' blog (and it looks like you do!), you've probably come across this majestic article about JavaScript and Unicode. There's a solution for these cases that uses Punycode (provided in Node but deprecated - here's a valid substitute for the browser too).