I love writing, and I also love data. When starting my blog, I integrated Google Analytics — it's free, easy to set up (just drop a few tags on the page), and that's what I knew back then. I did not enjoy it being run by a big corporation, but I was too lazy to research the alternatives, and then pay for them. But when Google announced sunsetting the current generation of analytics, and I had to take some action at any rate, I decided to pull the trigger and make a jump to an open-source alternative.
My analytics needs are quite simple, really:
- See how many people read my articles, along with basic stats like session depth and duraiton, because this makes me feel good.
- Filter by source to see which places work better for sharing my content.
- Custom events, so that I see how design changes help me grow an audience (and this works as a cheap JS error tracker).
- Data export would be nice — sometimes I play around with it in pandas.
In this article, I give compelling reasons to switch to self-hosted analytics. I share my research on seven open-source analytics tools so that you don't have to do it yourself. Then, I provide a quick tutorial to get up and running with plausible. Whether you're still on google analytics (or a comparable corporate tool like Meta pixel or Yandex metrica), or just considering adding analytics to your website, this will get you up and running in no time.
Why self-host your web analytics
I know that paying (either by subscribing to a paid hosted version or maintaining your own infrastructure) for something you can get for free needs justification. And in case on web analytics, you do get decent benefits for your buck:
- Privacy — we all know google eats user data for breakfast, lunch and dinner. You "pay" for google analytics with your users' data. The tools we're discussing do not come from companies that run the internet, and don't have the same data collection motivation, so you do protect your user privacy by switching. You might argue "I like programming" is not the most private information, so let's move on to the less moral and more practical arguments.
- Better data. Google analytics are blocked by most adblock tools, leaving you with incomplete data — some research suggesting figures around 43%. Switching to a less intrusive alternative gives you back the missing data, and more data means better insights with less effort.
- Fuck big tech. My blog was basically excluded from google search in october 2022 for no reason, and I want revenge. Depriving google of my users' data is like a microbe bite, but it does feel good.
While not directly related to self-hosting or open source, most of the tools we discuss have extra desirable treats:
- No cookies, so I don't have to worry about GDPR banners or whatever (even though some disagree). To be fair, I'm not 100% sure you absolutely need a banner whenever you have any cookie, but I'm not a lawyer and no cookies sounds like no problems to me.
- Simpler interface. This does sound like a made-up justification for limited functionality, but I honestly see this as a win for a casual analytics user like myself.
- Smaller JS footprint = better website performance. Google Analytics is famously around 50KB / 21KB gzip of JS, while most tools we discuss have scripts under 2KB. It's not a huge deal, as a tracking script does not block anything and is cached, but a pleasant bonus. In theory, this lets you catch a few events that occurr soon after the first page load.
Now, is self-hosting the anallytics worth the trouble of setting up and maintaing the infrastructure? I think the answer is yes, and here's why:
- Yet more privacy. When sending your user data to any third party, you're counting on their good will. If your data goes to your own server, it just stays there. If you're extra paranoid, you can ban outgoing traffic from the analytics container, or implement any other security measures.
- Yet better data collection. Some privacy protection tools do restrict connections to well-known analytics hosts. Sending data to your own domain makes blocking your tracker much harder, because a domain blacklist won't cut it any more.
- Price. As we'll see, you can self-host analytics for around 6$ a month, which is cheaper than the paid analytics services I've seen, and this price will not auto-escalate if you suddenly have a traffic spike.
- Data ownership. Your historic data lives on your server, and you're free to export it regardless of any decisions the maintainers of the tools make. Not every tool on our list has easy export, but nothing stops you from a DB dump in the worst case.
- No vendor lock-in. With your data available, you can (in theory) migrate to any other analytics service if you change your mind. Again, not all the tools can import data, but some can.
- Future-proof. The company developing your tool of choice goes out of business, switches to closed-source, or makes a big update that requires you to reconfigure everyting from scratch? No worries, you can just stay on your current version as long as you please.
This should be enough to convince even the most skeptical readers that self-hosted analytics instance is worth it. Now, what tools are available for the job, and which one should you pick?
Open-source web analytics tools, an overview
When it comes to open-source analytics tools, we have three top-tier choices: matomo, plausible and umami.
Matomo is the most established tool in the game (been doing it since 2011). I have personally seen it used by huge compaines, and people were pleased with it (it was called piwik back then). It's the closest alternative to google analytics in terms of features — you get customizable dashboards, integrations (ecommerce, website builders, multiple search providers), alerts, plugins, exports and what not. It even has iOS / android SDKs. But the truth is, I don't need all these extra features, I don't want to pay for it (in terms of a larger tracking script and more complex dashboards) and I'd be perfectly happy with something simpler, which brings us to...
Plausible, in contrast, offers a stripped-down analytics experience with only the essential metrics — page views, bounce. duration, source, device, location by default. It's cookie-free and features a tiny script (claims to be "< 1KB", but it's 1.6KB on my page). Any custom events can be sent via JS API. The docs are top notch, the docker set-up is straightforward, and you can import and export CSV data as you please. I'm loving the feature set, and this is what I ultimately went with. One valid criticism of plausible is that it's slightly convoluted, featuring two databases and an elixir server, which makes it pretty hard to set up without docker, but I don't mind it.
Umami is another "simple analytics" tool. I find it extremely similar to plausible — same minimal dashboard, same cookie-free tracking, same tiny script (1.6KB gzip). It launched in 2020 with great hype, but plausible quickly caught up. As of 2023, umami still lacks data import from GA, and only supports export via DB dump — plausible can do both easily. I also prefer plausible docs. Umami has only one DB, and can run as a raw node service. Overall, it's a viable choice, and competition is always good.
For historical reference, fathom was among the first privacy-focued trackers. It has since re-launched as a paid product, freezing development on the open-source version. The fact that you can still install and use it demonstrates my point on "suture-proofness" in action. I'd prefer something that's actively devloped, and I see no reason to pick it over the tools mentioned above.
Among the lesser-known tools I found these three to stand out from the competition:
- Offen (dope landing page design!) is built around "fairness". Your visitors must give explicit consent to data collection, and can view or delete their data at any time. On the practical side, it comes with auto-renewing SSL out of the box, which is nice. I'm not against consent, but I have personally developed popup blindness, and I'm afraid the users will just scroll with the banner hanging around. Sadly, it can't track custom events (clicks, forms, etc) as of 2023, and has no import / export (presumably by design).
- Goatcounter can run without JS via a tracking pixel (but comes with full JS version, too). Extras: CSV export, API access, good docs, signle-binary deploy, and the name is quite funny. You might say the UI is dated, but I'd call it hacker-themed.
- Ackee comes with extra anonymization and features a GraphQL API that allows you to import / export data as needed and build custom integrations or dashboards.
Then, of course, you can just take a generic log analyzer (grafana, graylog, goaccess) and throw any data you wish there. You get more flexibility, but there are no precofigured views for common web stats, and you must write the actual tracking logic (both JS and the endpoint) yourself. We track product metrics via a log aggregator at work as it lets us trace problems to back-end issues, but I wouldn't take this path for a simple website.
Ultimately, I picked plausible because of its minimal approach, complete feature set (mainly data import / export) and great docs. However, all the tools on my list have their strenghts, and you can pick whichver you prefer, or even try a few before commiting. Here's a flowchart with some things that might influence your decision:
Up and running
Before we get started, you'll most certainly need two pieces of infrastructure:
- A server — fair enough, you need a hosting to self-host anything. I use an entry-level linode for $5 a month, which is a typical price. With your average blog traffic this leaves room for other services — OpenVPN server, Gitea or whatever you wish.
- A custom domain name. GitHub can't proxy some traffic from
username.github.io
to your analytics server, and HTTPS won't work on a raw server IP without a domain name. You can get one starting at $5 a year (more for a decent TLD, make it $20). I heard porkbun suggested. You are not obliged to use this domain for your website (even though GH pages support that), so something shady likefk12sdj1244.cyou
works if you wish.
With that in place, we can set up our plausible instance. I'll make it quick, since there already is a tutorial by Mansoor, and the docs do a decent job:
- Install docker.
- Start plausible: clone the repo, edit config, and
docker-compose up -d
the service (see docs). If this worked, you can access the UI athttp://your.ip.1.1:8000
- You can't send analytics from your HTTPS website to an HTTP endpoint, so there's more work.
- Go to your DNS settings, come up with an analytics subdoiain (like
an.you.io
) and add server IP to the relevant A/AAAA record. Plausible can not live on a/path
, it needs its own subdomain. Success = the UI is live onhttp://an.you.io:8000
- Now we need a reverse proxy for SSL termination. I went for a full-dockerized setup with nginx-proxy container. Set
VIRTUAL_HOST
inplausible-conf
to enable discovery, appendnginx-proxy
todocker-compose.yml
, restart, and you should be able to access analytics athttp://an.you.io
. Un-exposing the port 8000 at this point won't hurt. - Now, the actual SSL certificates. nginx-proxy has an acme-companion container that manages letsencrypt certificates for you. Add it to your
docker-compose.yml
along with all the extra volume declaraions, and you're all set —https://an.you.io
- It's time to add
https://an.you.io/js/script.js
script to your website and watch the data flow. - The most painful step is importing historical data from google analytics (thanks google, it is possible). Here are the docs to help you. Took me an hour of Google Cloud and SSO suffering, but it all worked out fine.
Et voila, you're self-hosting your analytics. No need to involve google nay more. The same process applies to any dockerized setup with minor adjustments, not just plausible.
Self-hosting your website analytics comes at a cost — to be precise, around $7 a month for hosting and the domain, but you can use the same infrastructure for other tasks and you get quite a few benefits in doing so:
- User privacy — no data sent to any third parties.
- Less missing data — your own domain is unlikely to be adblocked.
- Data ownership — you can always export historic data and move to another tool (worst case — via SQL dump).
- Future-proofness — even if the developer goes out of business, your instance stays with you.
- Price (when compared to hosted alternatives) — you can easily run analytics for under $7/mo, including a custom domain, while hosted solutions start at $9.
- Most importantly, it's one small step that we, as a content creators, can take to make a dent in corporate monopoly.
Most privacy-focused analytics tools also feature no cookies (aka fewer legal concerns, but not like in legal advice etc.), simpler interface, and smaller tracking JS (both at the expense of less data).
We examined seven open-source analytics tools that you can self-host. Here they are, in order of my preference:
- Plausible is a minimal analytics solution with enough features for me: custom events, API, GA import, CSV export. Great docs!
- Umami is very similar to plausible. I prefer import / export and docs on plausible, but umami has slightly better mobile UX. Close call.
- Goatcounter is a hacker-style tracker. Runs as a single binary, supports no-JS tracking via pixels or log imports. UI choices are divisive.
- Matomo is a well-established platform with lost and lots of features you probably don't need.
- Ackee comes with a GraphQL API and nice UI.
- Offen is conceptual: it requires explicit consent, and users can view their data. How well this works in practice remains to be seen. Some features are missing — most importantly, custom event tracking.
- Fathom is not bad, but I can't recommend it since the development of free version has been shut down.
Finally, I showed you how to install and integrate your own plausible instance, along with HTTPS.
Now it's your turn to self-host your analytics and say good-bye to google.
Top comments (0)