Hi, software engineer, did you think what happens with your app when it goes into the wild? Have you ever faced the situation when requests from your clients differ a lot from your test ones? Do you recall all your complaining clients who were sending requests with wrong “Content-Type”?
First time I connected an Application Performance Monitoring (APM) to my REST API was like: “Oooh what should I do with this volume of new data?”. What is bad and what is good? Should I worry about CPU load? Or maybe requests count?
To answer these questions you should think about your REST API from the client’s side. Your client doesn’t care about the CPU load when your API runs smoothly, but he will definitely panic if your API responds slowly, this can happen even when the CPU usage is low. So let’s agree on that not all metrics are valuable. After my 10+ years of experience in developing REST API’s I came up with a rule: Every metric should be an answer to a simple question related to your business and your clients’ experience.
Good metrics are related to your business and affect clients, for example: “What amount of time a client should wait till he gets the response?” (e.g. request duration), “How many clients is my app serving right now?” (e.g. count of requests by clients), “Has a client received the response?” (e.g. connection state).
On the other hand, here is the list of metrics that are pretty interesting for a developer, but don’t have a direct impact on your client: “What amount of memory does my application use?” (e.g. memory consumption), “What number of i/o ops does my app performs?” (e.g. i/o ops per minute). All these metrics are not strictly connected to your business, because who cares If your app consumes all the memory but still responds in 10 ms.
Now you known the philosophy behind my key metrics, based on it I made a compilation of critical REST API metrics you should worry about as a developer:
Requests count is a simple but super useful metric. With this metric you can answer next important business questions:
- Is anyone using my API? (if requests count is zero then it’s probably nobody)
- Is my API working? (if requests count is zero than it’s probably broken)
- Is my API under a DDoS attack ? (if requests count during the last hour is much higher than average than probably it is)
When I was a beginner software developer I was wondering why we just can’t send code 200 for each request? Response status code is the easiest way to understand what happened with the request without reading and decoding the response body. So it’s important always to respond with a proper code in your REST API. With response status code you can answer next questions:
- Are clients properly calling my REST API methods? (e.g. no 4xx codes)
- Has my API ever crashed? (e.g. 5xx codes)
- Is my API running properly? (e.g. only 2xx codes)
Gold standard for request duration is 200 ms, but let’s be honest it’s not true for the real world. In my experience some methods’ calls could take up to 60 seconds and it was still OK for the client. So you should negotiate the upper threshold of requests’ duration with your client, and fix its value in the SLA (Service Level Agreement). Armed with this metric you can answer next valuable business question:
- Is my API responding not slower than it should? (e.g. all responses are sent faster than the threshold time)
This metric describes the connection state. It’s highly related to the requests duration metric. Having unfinished requests means that your client had closed the connection before the response arrived. Next question comes to my mind:
- Do my clients always receive the response? (If there are a lot of unfinished requests, then it’s definitely a reason to check this out)
Requests count, response time, status codes and connection states are the four simple metrics you should keep an eye on. I personally have been using these metrics for the last 3 years and have no plans to stop doing it.
I’m building 📊SLAO: Node.js + Express monitoring. Sign up for a free trial!
Not sure yet? Just press 🧡 for this post.
Originally posted here