DEV Community

Cover image for Do You Actually Know What A String In JavaScript Is? Here's What I Found.
Nick Bull
Nick Bull

Posted on • Originally published at blog.nickbulljs.com

Do You Actually Know What A String In JavaScript Is? Here's What I Found.

We preferred to think that String in JavaScript is an array of characters.

const name = ‘Nick’

console.log(name.length) // 4
Enter fullscreen mode Exit fullscreen mode

Variable name has 4 characters ‘N’, ‘i’, ‘c’, ‘k’ and length is also 4.

Everything seems logical.

Let’s go further and add emoji to my name.

const name = ‘Nick 🐃’

console.log(name.length) // 7
Enter fullscreen mode Exit fullscreen mode

Hmm, strange.

Variable name must have 6 characters ‘N’, ‘i’, ‘c’, ‘k’, ‘ ‘ (whitespace) and ‘🐃’

But have 7.

It seems like the bull has 2 characters.

const emoji = ‘🐃’

console.log(emoji.length) // 2
Enter fullscreen mode Exit fullscreen mode

Interesting 🤔

Let’s figure out why.

We go to the official documentation of ECMAScript (it’s a programming language on which JavaScript is based).

Scroll to “6.1.4 The String Type.”

And find this:

“The String type is the set of all ordered sequences of zero or more 16-bit unsigned integer values (“elements”) up to a maximum length of 2⁵³ - 1 elements. The String type is generally used to represent textual data in a running ECMAScript program, in which case each element in the String is treated as a UTF-16 code unit value.”

So string in JavaScript is a sequence of UTF-16 code unit values.

❓What is UTF-16?

💬 A Unicode transformation format (UTF) is an algorithmic mapping from every Unicode code point to a unique byte sequence.

One UTF-16 code unit value is a number from 0x0000 to 0xFFFF.

❓What is 0x0000 and 0xFFFF?

💬 0x represent the hexadecimal numeral system, often shortened to "hex", is a numeral system made up of 16 symbols (base 16). The standard numeral system is called decimal (base 10) and uses ten symbols: 0,1,2,3,4,5,6,7,8,9. Hexadecimal uses the decimal numbers and six extra symbols.

If we convert my name Nick to UTF-16 (like JavaScript see it) we will get 0x004e 0x0069 0x0063 0x006b.

0x004e = N

0x0069 = i

0x0063 = c

0x006b = k

But how does JavaScript treat emojis?

In UTF-16, Unicode characters from the Basic Multilingual Plane (contains characters for almost all modern languages) are encoded with one code unit.

Other characters from the non-Basic Multilingual Plane (emojis, musical notations, cards, hieroglyphs, etc) require two code units.

So UTF-16 format represents 🐃 emoji with two code units (0Xd83d 0Xdc03).

That’s why ‘🐃’.length gives 2.

To consolidate everything we have learned, let’s play a little with Unicode and JavaScript.

const name = ‘Nick’

const nameInUnicode = ‘\u004e\u0069\u0063\u006b’

console.log(name === nameInUnicode) // true

console.log(nameInUnicode.length) // 4

const fullName = ‘Nick 🐃’

const fullNameInUnicode = ‘\u004e\u0069\u0063\u006b\u0020\ud83d\udc03’

console.log(fullName === fullNameInUnicode) // true

console.log(fullNameInUnicode.length) // 7
Enter fullscreen mode Exit fullscreen mode

❓ What is \u?

💬 A Unicode character escape sequence represents the single Unicode code point formed by the hexadecimal number following the “\u” or “\U” characters.

In the end

Knowing that string in JavaScript is a sequence of UTF-16 code unit values can save you from unpredictable bugs when you work with different characters not from BMP, like emojis.

If you like this article, share it with your friends and follow me on Twitter.

Also, every week I send out a "3–2–1" newsletter with 3 tech news, 2 articles, and 1 piece of advice for you.

📌 Subscribe to my 3–2–1 newsletter here 📌

Top comments (9)

Collapse
 
cwraytech profile image
Christopher Wray • Edited

Great post! I almost didn’t read it because the title needs an “a” before string... but I’m glad I did read it! Thanks!

Collapse
 
cwraytech profile image
Christopher Wray

Looks like you fixed it!

Collapse
 
nickbulljs profile image
Nick Bull

Yes, thanks for pointing out!

Collapse
 
anicode profile image
Abhis

WOW! what a great article.
i really felt like adding this one to complete your post dev.to/prodexia/master-web-designi...

Collapse
 
lexlohr profile image
Alex Lohr

You cannot change a string in JS, just instantiate a new one with changed content. Also, string literals cannot store any own properties.

Collapse
 
Sloan, the sloth mascot
Comment deleted
Collapse
 
nickbulljs profile image
Nick Bull

Fixed, thanks.