DEV Community

Rahul Nagare
Rahul Nagare

Posted on • Originally published at scaledynamix.com

Going Beyond Uptime Monitoring

In 2020 it is common knowledge that slow websites poorly affect revenue, brand loyalty, and conversions. Do you know what impacts conversions even more?

Outages!

An unreachable website can’t generate new conversions, revenue, or brand loyalty.

There are a wide variety of services that help monitor WordPress sites from different locations in the world. Most of these services access your site every minute and generate an alert if the site is unreachable. While you can’t go wrong with any of these services, I prefer uptimerobot. The number of false positives is minimal with uptimerobot, plus they have a generous free tier.

Uptime monitoring, in combination with server alerts, is sufficient for most small to medium-sized sites. Afterall there aren’t many things that can go wrong on a single-server setup.

But what happens as your site grows and you move to a distributed setup?

Imagine scaling your site across 200 containers on Kubernetes. 198 of these containers are healthy. Remaining two containers, while active, are returning HTTP 200 responses with white pages to all visitors. In other words, most visitors and uptime monitoring services can access your site just fine, but a small number of visitors see blank pages.

How do you find out about the two faulty containers? You can’t wait for visitors to complain, only to have the Ops team respond, “It works on my machine!”.

This is where you need to go beyond uptime monitoring. In high-availability and distributed hosting environments, transaction monitoring becomes essential.

You can do this using a combination of real user monitoring (RUM) and APM insights using a service like New Relic.

RUM monitors and logs each visitor’s browser timing. RUM helps measure website response time across different browsers, regions, and devices. APM is integrated with PHP to analyze, profile, and logs each transaction. APM helps you measure the response time for uncached pages on your site and transactions such as signups, checkouts, and more.

Services like New Relic support setting up custom events for RUM and APM. For example, you can set up an alert if the response time is more than 3 seconds for any visitor or if the bytes transferred are below 1 Kb for logged-in users.

Utilizing RUM + APM based monitoring can help you catch edge cases and significantly minimize the troubleshooting time. Throw container monitoring in the mix, and you get excellent visibility into the high-availability environment that powers your site.

Top comments (0)