loading...

When the white space became a beast

Alexandru Trandafir on May 10, 2017

This article was originally posted at HeavyDots Blog_ The legend of the white space Every now and then, on the World Wide Web, someone ... [Read Full]
markdown guide
 

Lol. Excel has gotten a coworker of mine with this too. He emailed the offending code to me and and I couldn't find an issue... because the email program converted it to a normal space during copy/paste.

 

Yeah that's the scairy thing about it that it can disappear! :D I found it in Excel in a date field that would not get converted when importing the Excel into a custom PHP app. The date's format looked right.. but on both left and right side it had this strange char.. who knows when it got there!

 

What problem is the non-breaking-space creating?

Surely the myriad of other Unicode spacing characters would also create similar issues?

 

There are dozens and dozens of Unicode characters that show up as blank space. You might think that \s in a regex would find all of them, since it matches characters with the "separator, space" unicode property. But not all blank characters have the "separator, space" property, including (with the characters between parentheses):
U+3164 HANGUL FILLER (ㅤ)
U+1D173 MUSICAL SYMBOL BEGIN BEAM (𝅳) (there are 7 other similar musical symbols)
U+200D ZERO WIDTH JOINER (‍)
U+180E MONGOLIAN VOWEL SEPARATOR (᠎) (only shows up as blank in some fonts)
There's even one character,   (U+1680 OGHAM SPACE MARK) that has the "separator, space" property and doesn't display as whitespace. Hilariously enough, you can use this character as whitespace in JavaScript.

code of conduct - report abuse