Shy-phen! the best named HTML character entity!

#html #frontend #unicode

This is a celebration post of the seldomly seen HTML character entity that should get more appreciation: the shy-phen. Its not for its function that we celebrate it, but for its perfectly fitting Entity encoding.

I was working on a Dutch in-depth blog post about Accessibility and I wanted to hyphenate words only once the screen would become to small.

Before celebrating the shy-phen lets first talk about what we are looking at here. We are looking at a HTML character entity reference. Which is a wild ride on its own.

The references have come to exist do to a simple problem: we have to many characters in different languages and different fields of expertise to fit them all nicely on a keyboard.

That's why over time we have come up with many standards that describe sets of characters and how to encode them. The leading standard for describing characters right now is Unicode, which describes 149,186 unique characters and has 1177 new characters on its for ever growing waiting list.

Currently HTML supports at least 2231 by name of these characters, although you do best to check if any character you want to use actually is supported.

See how I use, "by name"? This has everything to do with the fact that we have 3,75 ways to write down a HTML entity reference in HTML! We can target them as numeric character reference, a hexadecimal numeric character reference, a hexadecimal numeric character reference without leading zero's (I am counting it as 0,5 way) and a character entity reference, and some character entities may omit the ; (counting it as 0,25 way).

Lets take the yen ¥ as an example we can write it as:

&#165; 
&#x00A5; 
&#xA5; 
&yen;  
&yen

all of them are valid.

So, why so many notations? Both the hexadecimal and decimal notation now work in HTML but older versions of HTML only supported decimal notation (Unicode itself seems to have used hexadecimal notation since version 1.0.0).

The character entity notation is mentioned in html 4.0 as being a way to more easily remember encodings by name then by numbers. It seems to be the main introduction to the named entities and it started out with 96 entities.

Its current list concluded it was no longer desirable to keep adding new entities, having grown to 2231 entities it is considered it will introduce more risks and little value.

Now why would we celebrate the shy-phen? It has everything to do with its entity name and its function. The name is  and its function is to only show itself once really needed and else it hides. So in other words, its a very shy tag!!!.

When we consider its function, its terrible to use. You need to put in words to break them up on locations you want to allow it to be broken up. So far so good.
However you don't want to add this hidden hyphenation in every word, because it would make the text hard to read so ideally you only implement on words you know to be to long to render in a single line, so that they break up nicely on smaller screens. With the advents of folding phones and phones and screens having different pixel densities it is quite up in the air if your shy hyphen will ever come out to play.

Anywho, here is my love post about a very shy hyphen. Let me know if you have any unicode or html entity you feel strongly about :D.

DEV Community

Shy-phen! the best named HTML character entity!

Top comments (0)

Read next

What's New in Next.js 15: New Hooks, Turbopack and more

🍏 Jony Ive Interview, Hyper-Responsive Components & Git Tricks (5-min read)

Embed a Full HTML Document Inline Using Shadow DOM

"Looking for Developers to Build a Decentralized Social Network on Nostr"