DEV Community

Cover image for When the white space became a beast

When the white space became a beast

Alexandru Trandafir on May 10, 2017

This article was originally posted at HeavyDots Blog_ The legend of the white space Every now and then, on the World Wide Web, someone ...
Collapse
 
antonfrattaroli profile image
Anton Frattaroli

Lol. Excel has gotten a coworker of mine with this too. He emailed the offending code to me and and I couldn't find an issue... because the email program converted it to a normal space during copy/paste.

Collapse
 
atrandafir profile image
Alexandru Trandafir

Yeah that's the scairy thing about it that it can disappear! :D I found it in Excel in a date field that would not get converted when importing the Excel into a custom PHP app. The date's format looked right.. but on both left and right side it had this strange char.. who knows when it got there!

Collapse
 
mortoray profile image
edA‑qa mort‑ora‑y

What problem is the non-breaking-space creating?

Surely the myriad of other Unicode spacing characters would also create similar issues?

Collapse
 
tbodt profile image
tbodt

There are dozens and dozens of Unicode characters that show up as blank space. You might think that \s in a regex would find all of them, since it matches characters with the "separator, space" unicode property. But not all blank characters have the "separator, space" property, including (with the characters between parentheses):
U+3164 HANGUL FILLER (ㅤ)
U+1D173 MUSICAL SYMBOL BEGIN BEAM (𝅳) (there are 7 other similar musical symbols)
U+200D ZERO WIDTH JOINER (‍)
U+180E MONGOLIAN VOWEL SEPARATOR (᠎) (only shows up as blank in some fonts)
There's even one character,   (U+1680 OGHAM SPACE MARK) that has the "separator, space" property and doesn't display as whitespace. Hilariously enough, you can use this character as whitespace in JavaScript.