Awww shi...here we go again.
Yes I am back, breaking the internet once more. I know, it has been a while.
But this time...I may have come up with something kind of useful?
We will see!
First of all, the title is not clickbait.
I have actually built a (buggy) syntax highlighting system, that uses just a single element and can function in pure CSS (for displaying snippets).
No <span>
elements for each different part of syntax that is highlighted, no bulky JavaScript libraries to render everything.
Sounds interesting? Let's jump right in with a code example:
An example
Look at this beautiful syntax highlighting!
All done in less than 1kb of code!
Here is the HTML:
<pre>
let thisContent = 'highlighted';
for(x = 0; x < 10; x++){
console.log('loopy loopy loop' + x);
}
</pre>
Note: Please use <pre><code>
for marking up code blocks in production.
And here is the CSS:
body{
background-color: #111;
padding: 20px;
color: white;
font-size: 125%;
}
pre{
width: 80ch;
font-family: monospace;
font-size: 30px;
line-height: 30px;
background:
linear-gradient(to right, white 0ch, #78dce8 0ch, #78dce8 3ch, white 3ch, white 18ch, #FFD658 18ch, #FFD658 31ch, white 31ch, white 80ch),
linear-gradient(to right, white 0ch, #a6e22e 0ch, #a6e22e 3ch, white 3ch, white 12ch, #f92672 12ch, #f92672 15ch, #A7C 15ch, #A7C 17ch, white 17ch, white 80ch),
linear-gradient(to right, white 0ch, white 0ch, white 4ch, #fd971f 4ch, #fd971f 11ch, white 11ch, white 12ch, #a6e22e 12ch, #a6e22e 15ch, white 15ch, white 16ch, #FFD658 16ch, #FFD658 34ch, #f92672 34ch, #f92672 37ch, white 37ch, white 80ch),
linear-gradient(to right, white 0ch, white 0ch, white 80ch),
linear-gradient(to right, white 0ch, white 0ch, white 80ch);
background-repeat: no-repeat;
background-size: 80ch 30px, 80ch 60px, 80ch 90px, 80ch 120px, 80ch 150px;
background-clip: text;
-webkit-background-clip: text;
color: transparent;
}
That is it!
So what?
So...highlight.js, one of the leading syntax highlighting libraries is...wait for it...286Kb (and that is minified and gZipped! It is over 980kb of raw JS 😱).
That is a huge amount of JS to push down the wire if all you are wanting to do is show a code snippet with some syntax highlighting.
Yet many sites do this, destroying their performance.
So while the demo may not look that impressive, the 99.8% data saving is pretty impressive!
Explanation: how does it work?
So what we are doing is applying a very carefully created background linear-gradient
to the <pre>
element, that looks like this:
Each of the coloured blocks corresponds to some text we want to highlight.
Then all we do is use background-clip: text;
, a CSS property that allows us to say "hey, can you please use the text above the background as a mask, and only show the background if there is text in front of it, otherwise show a transparent background."
And we end up with this example:
Working Example
But if it is that easy, then why doesn't everyone do this?
The short answer is: linear-gradients
and alignment.
To try and align the linear gradient by hand would take a lot of time.
Also, this relies on lines not "wrapping", as otherwise the gradient will not align anymore (although with some clever maths, we could probably account for this! Not part of this demo though).
You see, in order to generate the gradient we need to know:
- how wide each highlighted section to be generated is
- how many lines of text there are
- to know what colour each part needs to be highlighted in.
The second part is straight-forward. But working out the width of each item to be highlighted is hard, working out what to highlight...is really hard!
Luckily...I am silly enough to build a fully working editor, which finds relevant "tokens" within a JavaScript snippet, and then uses these to calculate where each part of the linear-gradient
should be.
Wanna try it?
The Snippet generator
Instructions:
- Input: Enter a short JS snippet in the first box(*)
- Preview: Check that the preview is as expected (Preview) (this is a janky setup for highlighting, it may fail on certain snippets).
-
Output: Copy and paste the resulting code from the output section into your code! (Don't forget, this is designed for a dark background, so you need a dark background on either your
<body>
element or to create a wrapper around the<pre>
element and give that a dark background.
(*) Due to limitations of this demo, please make sure that no line is more than 70 characters in length.
That is it, give it a try below:
Understanding how this works
Look, that JavaScript is a hot mess of cobbled together snippets...I would not expect you to try and follow it.
Here is the simplified version of what is happening though:
- We grab each line in the (input) section and loop through it.
- We use a RegEx (yes...I know) to capture key terms in JavaScript such as
let
andfunction
etc. - We then create a linear gradient for each line, with the length of each section of the gradient corresponding to found matches.
- Finally, we adjust the
background-size
CSS property to account for the height of the given lines of code, so that eachlinear-gradient
declaration we have lines up with the length and height of each line of code.
That last part might be the most confusing part.
To explain better, think of the following:
let a = 'test';
let b = a + ' your code';
Let's assume that all we want to do is highlight the strings in these two lines ('test' and ' your code').
So we need 2 gradients.
They need to be 25 characters long (the length of the longest line) and we need to have a coloured block appear at:
- character 9 to 13 on line 1
- character 13 to 23 on line 2
These turn into linear gradients as follows:
linear-gradient(to right, white 0ch, white 8ch, red 8ch, red 14ch, white 14ch, white 25ch),
linear-gradient(to right, white 0ch, white 12ch, red 12ch, red 24ch, white 24ch, white 25ch);
Where "red" is our highlight colour and "white" is our non-highlighted colour. (our characters are 0 indexed in case you wonder why the number / position of each character is 1 less).
BUT, if we just tried to use those two gradients on their own, it would fail.
This is because the first gradient has taken up 100% of the height.
So to fix this, we need to set the height of each of the linear-gradient
declarations, using background-size
.
This would look like this:
background-size: 25ch 22px, 25ch 44px;
Assuming a line-height
of 22px (so the first gradient is 22px high from the top, to cover the height of the first line, and then the second gradient is 44px high from the top, to cover the second line. This is because linear gradients stack on top of each other and the one that is declared first is on top).
Oh but it still doesn't work yet.
You see, gradients repeat by default. So we also need to set background-repeat: no-repeat;
Now we get a working highlight on the strings:
And that is essentially it, just add different colours for each type of token and you have a "working" syntax highlighting system in pure CSS with no <span>
elements.
But...why?
Ok, ok. You have now read the whole article and are still asking why. That is fair!
To be honest, I just had a silly idea.
But also, I like to take something and use it in a way that was not intended.
I find it is a great way to learn as there are no tutorials I can follow for things like this. I just have to keep trying things and work out how to solve the problem.
It also really helps you learn things more deeply as you need to read the docs and experiment. (for example, I learned a couple of things with linear-gradient
s that I didn't fully understand before, such as how they stack when you declare multiple linear-gradients as a single background)
So what do you think?
Could this actually become something useful?
Can you imagine generating super light-weight code snippets for documentation as part of the build step, and just serving CSS and a single element?
Although it is a joke project right now, could the concept actually work in production? 🤔
Let me know what you think in the comments. 💗
Top comments (23)
You're not comparing like-for-like, though. highlight.js's source code implements the syntax highlighting logic and grammars for the languages it supports, not the output of highlighting a given code block. If you want to compare output size, which is only really relevant if you're doing the highlighting on the server side, then the equivalent highlight.js output is this (you can try it in the highlight.js playground):
...which is 737 bytes, plus the CSS required for the various classes (
hljs-keyword
,hljs-string
, etc). Even with very aggressive dead code removal, that CSS would undoubtedly push it over the ~1.2k bytes of your version, but that wouldn't necessarily hold for significantly longer code snippets, as the length of thelinear-gradient
increases along with the highlighted code.I do think the CSS gradient approach is an interesting and unique idea, and I'd be interested to see if it could be expanded into a comprehensive and robust syntax highlighting solution. Here are some currently failing cases based on your codepen logic:
Output:
1 and 2 could definitely be fixed, at the cost of implementing a much more robust JS grammar (for reference, here's highlight.js's).
3 and 4 are significantly more problematic, as many characters are wider or narrower than printable ASCIIs, even in monospace fonts. In TTY environments, text is laid out in a grid format of columns and rows in which characters can have column widths of 0, 1, or 2; but in-browser
pre
elements have no such restrictions, and wide/narrow characters usually aren't an exact multiple of monospaced characters (as you can see from the screenshot, "呜呜呜" and "💩💩💩" are different widths, whereas in a TTY they'd both be the same width as "XXXXXX").Actually, I think
💩💩💩
is already being treated as length=6 here, not due to physical width but because JavaScript has a Unicode problem, but even that's not wide enough to cover its physical width.You could try physically measuring the characters with something like CanvasRenderingContext2D#measureText, but that would be a significant increase in complexity, a performance hit in the browser, and impossible on the server side.
5 might fixable easily enough without line wrap, but adding line wrap would add a host of new problems, as now each "physical" line (accounting for where the line break would happen based on character width) would need its own gradient segment.
Anyway, I'd be interested to see where you take this if you try tackling some of those cases 😄
Thanks for taking my silly post so seriously lol.
It was a really well thought out rebuttal of the idea, love it!
One thing that you wasted too much time on was the highlighting being broken though...I used RegExs for tokenisation...it was never going to be robust which is why I said it is likely to break and commented (I know...) when I said i used a RegEx.
The gradient part would work if the there was a robust highlighter, that wasn't the concept being explored here!
Either way, I enjoyed the write up and I hope you enjoyed this silly experiment! 😂💗
It has been a while, how are you all?
What did you think of this silly idea? 🤔💗
Hi Graham, so glad you're back! (and back to your recommended Monday/Wednesday posting schedule?)
I will tell you how I like your idea when I eventually read your post, but I have seen enough to put it on my reading list and give you the "exploding head" sticker 😲🤯😹
P.S. I still haven't had time to give Mads' post more attention, but I see that you did ...
Syntax-Highlight CSS with Semantic HTML — and get Dark Mode for free
Mads Stoumann ・ Aug 12
... and took his approach some steps further.
So I am very curious...
Haha thanks bud! I never had a schedule, I just released in “chaos mode” whenever I felt like it! 🤣💗
You used to post some reading + reaction stats of your posts. Maybe the Monday/Wednesday thing was only my personal takeway. So now it's my mid-week social media time slot but I won't release anything new before next week, because I already spent too much time refining my latest post.
P.S. I updated my comment above to give credits to Mads
P.P.S. I found your stats post. Here it is:
My writing stats for 2021, best time to post on DEV and plans for 2022-2023 [over 250 articles planned]
GrahamTheDev ・ Jan 3 '22
Oh 100% I wrote when the best times to post were...I just didn't follow my own advice! 😂💗
Thanks for crediting Mads, but this concept is entirely different, the only thing I thought was "can we syntax highlight in pure CSS". 💗
I like the idea! I think this might improve rendering performance by reducing the number of nested elements. I just hope that rendering CSS gradients won’t slow it down 😅 I’d also like to point out that
<pre>
is just pre-formatted text, like poetry or some other plain-text formatted content. To markup something as code, you’re supposed to use<pre><code>
. This is recommended in the spec and a common approach for Markdown engines.Yeah, but then it isn't a single element if I use
<pre><code>
, I always deliver on my "clickbait" titles. 😂.I 100% agree though, it should be pre and code if you are doing it properly and so I added a small note about that under the first HTML snippet. 🙏🏼💗
I know, right? When CSS-only Mona Lisa is coded with two divs it’s not a piece of art anymore 😭
I don't know how you do it, but you do it Graham! Feels like a whole lot of maths, but you're a CSS guru I won't lie 😫💕🤌🤌 This is brilliant
Haha thanks Emy. I am TERRIBLE at CSS...I just fiddle with things until they work lol. 😂💗
Mind-blowing CSS sorcery! 🎩✨ Your ingenious approach to achieving syntax highlighting with gradients within a single element is truly remarkable. Breaking down complex concepts into a visually captivating format like this is both educational and inspiring. Kudos for pushing the boundaries of what's possible in web development!
Thank you, I am glad you liked my silly idea lol! 🙏🏼💗
Saying that it's CS only syntax highlighhting when there's a clear JS regex selection function hiding in the wings is a bit cheeky, Graham... But this is impressive nonetheless with cutting this down from the near 1MB raw JS needed for a simple syntax highlight library... Nice work.
I might not have made it clear that you can simply ship the CSS and HTML and it will all work.
Regex is for real time or for generating the CSS in the first place.
I don’t think that is cheeky, I just think that is realistic. You could do it all by hand if you wanted though…one of the few times you would probably then be grateful for Regex though! Hahaha. 💗
This is brilliant! I'd love to see the clever math solution tho
I think this can be useful in web editors. Maintaining an application that uses contenteditable is an experience that still gives me nightmares, but you are fine if you can eliminate its need.
With syntactic highlighting like this you may be able to use a textarea instead, or a contenteditable span where you automatically remove any pasted formatting, which will let you sleep at night.
Just make the token finding and CSS generation be real-time, and you should be good to go.
There is a real-time example further down the article. It sucks and is buggy, but you can at least play with that! 😂💗
This is insane, and at the same time, a perfect example of lateral thinking, that is a powerful thinking method when developing.
Congratulations
THANKS