DEV Community

Hiep Bao Le
Hiep Bao Le

Posted on • Originally published at komsciguy.com

Hide secret message with zero-width characters

Originally posted on my blog.

Zero-width characters are non-printing characters that are not displayed by most applications, which leads to the name "zero-width." They are Unicode characters, typically used to mark possible line break or join/separate characters in writing systems that use ligatures.

As they are "invisible," anyone can use them to co‎​‌​‌​​​​​‌‌​‌‌​​​‌‌​​‌​‌​‌‌​​​​‌​‌‌‌​​‌‌​‌‌​​‌​‌​​‌​​​​​​‌‌‌​​‌‌​‌‌‌​‌​‌​‌‌‌​​​​​‌‌‌​​​​​‌‌​‌‌‌‌​‌‌‌​​‌​​‌‌‌​‌​​​​‌​​​​​​‌‌​‌‌​‌​‌‌‌‌​​‌​​‌​​​​​​‌‌​​​‌​​‌‌​‌‌​​​‌‌​‌‌‌‌​‌‌​​‌‌‌​​‌‌‌​‌​​​‌​​​​​​‌‌​‌​‌‌​‌‌​‌‌‌‌​‌‌​‌‌​‌​‌‌‌​​‌‌​‌‌​​​‌‌​‌‌​‌​​‌​‌‌​​‌‌‌​‌‌‌​‌​‌​‌‌‌‌​​‌​​‌​‌‌‌​​‌‌​​​‌‌​‌‌​‌‌‌‌​‌‌​‌‌​‌‏nceal messages or information within plain text. Don’t believe me? I left a secret message in the first sentence. Read this post to know how it's possible.

Available zero-width characters

So far I’ve found 9 zero-width characters in the Unicode characters table.

Character Unicode
Zero-width space U+200B
Zero-width non-joiner U+200C
Zero-width joiner U+200D
Left-To-Right Mark U+200E
Right-To-Left Mark U+200F
Left-To-Right Embedding U+202A
Right-To-Left Embedding U+202B
Word joiner U+2060
Zero-width no-break space U+FEFF

There may be more, but nine is more than enough. In theory, only two different zero-width characters are enough to insert any type of data. Though binary representation is usually large, we can make use of every zero-width characters to effectively reduce the length of encoded data.

Fingerprinting

Zero-width characters can be used to fingerprint text. For example, someone within your team is leaking confidential information but you don’t know who. Just send each member a classified text with their name encoded in it. Wait for it to be leaked, then extract the name, and do whatever you like with them.

Unlike other steganography techniques (such as utilizing noises in images, videos, sound as the container), zero-width characters are not removed if the text is formatted, copied, pasted. It’s really hard to detect them without special tools, as most text editors don’t render them. In addition, we’re not limited in the amount of data that can be encoded. However, editors do count zero-width characters, so encoding too much data within a short text makes it more suspicious.

Tool

To demonstrate the ability to hide secret messages with zero-width characters, I created a tool here.

ZWC Tool

How does it work?

  • Use TextEncoder to the secret message from String to Uint8Array, which is an array of 8-bit unsigned integers.
  • Convert each integer to 8 bits, then convert each bit to zero-width characters:
    • Bit value 0 is encoded as Zero-width space (U+200B)
    • Bit value 1 is encoded as Zero-width non-joiner (U+200C)
  • Hide the encoded string in the middle of the carrier message.

In addition, two other zero-width characters are used to mark the beginning and ending of the encoded string:

  • Left-To-Right Mark (U+200E) marks the beginning
  • Right-To-Left Mark (U+200F) marks the end

This makes it easier to detect the position of the encoded string when decoding it.

Please refer to source code for more details.

Detect zero-width characters

Use any text editor that supports rendering of zero-width characters.

For quick test, you can use Chrome Developer Tools console:

Chrome Developer Tools console

This Chrome extension will convert any zero-width characters to emojis.

References

Top comments (0)