A few days ago, I launched Icon Horse, which is a simple, free API to quickly get the favicon of any website. The launch itself went really well, and I had a lot of fun building it.
But inside the seemingly simple but glamorous life of favicons, there is a lot of complexity. I thought I'd share some of how it works with you all, and some of the hoops I had to hop to build Icon Horse.
What is a favicon, really?
The year was 1999. Britney Spears and Eminem were at the top of the charts, the world was introduced to Napster for the first time, and the "browser war" was heating up with Microsoft releasing Internet Explorer 5.
One of IE5's new features was the favicon, or a small icon which was displayed in the "Favourites" menus/bars next to the title of the site someone bookmarked. Things were simple back then, and if the favicon existed, it would be loaded from the site's root like so:
https://mysite.com/favicon.ico
Since then, a number of new circumstances have come up requiring new types of icons – one example is the advent of the smart phone, which allowed people to save a website shortcut on their home screen. But since favicons tended to be low resolution, a number of diverging standards came about and added these on both Android and iOS.
Now while technically a shortcut icon, they are usually lumped in with favicons in general internet terminology and also during the process to create them.
Today, there are three main places to look for icons:
https://mysite.com/favicon.ico
- The HTML of the site in the
<head>
- A separate Web App Manifest file which is specified in the
<head>
So what's the problem?
I'm currently working on a tool called Meeting Canary, from which a note-taking interface is displayed for a given calendar meeting. Many meetings have links to relevant places, such as video conferencing apps (especially since the COVID-19 situation has moved many people to remote work).
I decided it would be nice to render a small icon next to the links, as a way to tease the content to my users:
After hunting for a good way to fetch icons for a given link, I found one that was a JSON API endpoint, but I wasn't very satisfied with this solution – I did not want to complicate my life by using it, since I still needed to write code to figure out the best icon to display from the list I got back.
Also, I did not find a single service that provided fallbacks, or an icon that would be shown if the site is unreachable or if it had no icons at all.
After all, Gravatar does this very well for email addresses, and it was strange that no one had done this with favicons yet! So I got to work.
It was really simple to make, right?
When it comes to standards, the web is pure chaos. When building a product to fetch favicons, you should expect no mercy.
Some sites have no icons at all. Some sites have only a few of the icons from the spec. Some sites use strange sizes. Some sites don't even bother to tell you the size (in pixels) of the icons.
Some sites have completely broken DNS or server issues (like infinite redirect loops). Some send confusing or broken headers. Some sites serve invalid HTML or JSON. Some sites have 404 Not Found
errors on their icons. Some sites used weird caching schemes. Some don't specify a MIME type or a file extension so you have to parse the actual image to know what you're dealing with.
The list of difficulties goes on – for example one prominent clothing retailer's icon is simply not loadable because they have a nasty bug in their site's redirects and headers meaning you cannot simply redirect: 'follow'
your way to the HTML page, but must chain one request after the other manually.
But I persevered through all those.
I knew there was going to be even more edge cases I hadn't considered in the future, so it was very important to build in functionality for fallback images. I never wanted to serve a broken image or a timeout.
I also had to make some decisions. Since my service was making icons available to all who wanted them, there was no telling the different use cases they would be serving. So I made the assumption that the best icon to serve would be the most high fidelity image. Also, for some use cases (such as React Native), SVG format icons would not work out of the box, and needed something like react-native-svg to get them to load, so I stuck to raster formats only (for the first version at least, I plan to open up SVGs as well in the future via a query parameter).
And finally, pulling all this content is time intensive. Consider that to serve an icon, one needs to:
- Load
https://asite.com
's HTML site - Parse the HTML and do a query lookup to get relevant icons and the manifest file
- Load the manifest file and parse it
- Check for the existence of the
https://asite.com/favicon.ico
icon - Merge all these icons together in a list and sort them based on a best to worst criteria, while also making sure the icons themselves are reachable (and don't
404 Not Found
) - If all this fails, generate a JPEG file on the fly as a fallback
- Serve the icon to the user
So when dealing with a slow server, it's possible that loading that icon could take quite a long time. To help with that, Icon Horse must cache the resulting list of icons and the resulting chosen icon.
I wanted to keep the functionality super simple, so when someone queried my API to get an icon:
https://icon.horse/icon/dev.to
I ended up with just the icon:
Putting it all together
From the very beginning, I knew I wanted to use a serverless approach to this service. The landing page is built out of Next.js and the service itself sits in a lambda function hosted on Amazon.
After struggling a little bit with getting the environment set up (it had to have image processing capability as well as a few other things), I managed to get it working and running properly.
And there you have it. I launched on ProductHunt (among other places) and was surprised by the overwhelming positive reception – I got almost 200 upvotes and almost 1000 unique visitors. What surprised me the most is how a favicon fetcher service I thought would be niche and developer only was actually really well understood by all kinds of people.
I learned a lot about the weird world of favicons and solved my own need, but above all had a lot of fun doing it.
Thanks for reading.
Top comments (13)
Man, this is a great!
I have never thought how much pain can be involved in getting this little image from the a given site.
At the same time we all are used to having these images near our links.
I myself am guilty of not taking care of all the formats for icons and probably most of my sites have crappy images for all sizes.
I mean it is taken for granted that your site should have them in all possible sizes, but amount of work needed for that – hardly ever one things of it.
I guess that’s why many sites don’t have them and that’s why icon.horse is solving an important task of figuring out what is often broken on websites.
Thanks for this app and opening my eyes on this side of the internet.
Hey @kostjapalovic ! Thanks for the comment!
Yes indeed, it actually also made me think just how resilient browsers must be. There are so many horrible sins we all commit against web standards all the time, it's kind of amazing things mostly work.
I guess we can only start to understand it when we consume other services, not create our own.
I remember my first time when I was dealing with sendind AND receiving emails. Man, this is even worse than icons 😂
Not to take away from your achievement, since I think it is actually better than this, but some may be interested to know Google has a little known API that does something similar, e.g. google.com/s2/favicons?domain=dev.to .. You are limited on size, get no idea about usage policy, and so forth - but I've used it in a personal search engine for years without problems. Icon Horse is probably a better option for future! :)
Hey @peterc , great point!
It's true, Google does have this (as does DuckDuckGo as well – external-content.duckduckgo.com/ip...). But like you said, it serves a 16x16 favicon with no options. The API is also meant to be used internally and is there but for the grace of Google – they can change it or remove it without any notice.
I am in the process of adding options to Icon Horse for:
Any suggestions?
Since I've found this (and hammered it for caching VaultWarden icon_cache) I have to mention that you'd want to:
a) filter private addresses RFC1918 (ie. 10/8, 172.16/12, 192.168/16) and the "special use" from rfc5735
b) I see gateway timeouts on addresses of hosts that doesn't allow "trusted" (or are actually internals resolving to addresses in (a) )
in both cases above, I'd advocate a simple cached ico of... something :)=)
Hi @hevisko ! That explains the spike in requests I got in the last few days ;)
Thanks for finding these things. I've fixed B., and I'll look at A. next week!
That's amazing! A perfect example of microsaas.
Thanks, @francescobianco !
Great work man. I love this.
I found this after google's google.com/s2/favicons?domain=gith... could not fetch github's favicon. Really weird.
Thanks for this. Works like magic
Weird that Google's service doesn't work in this case!
In any case, et me know if you have any feedback on Icon Horse. :)
i got a 404 :)
https://icon.horse/icon/icon.horse/icon/icon.horse
Hi @thehidden1 ! The 404 is because you want to just put the hostname into the URL, not the whole URL. So like this: