DEV Community

Cover image for My Top 10 DevOps Metrics
Ali Sherief
Ali Sherief

Posted on

My Top 10 DevOps Metrics

Hello and welcome back to the DevOps series. This week I'm going to talk about metrics. Whether for measuring team performance, the system's responsiveness, or how stable it is, metrics take center stage at any DevOps environment.

Background

Every DevOps-based organization will always have some areas they need to improve on, even if only by a little bit, reflecting in the respective metrics' ratings. While your team may already perform the main task of shipping quickly enough very well, shipping time is not the only metric you should optimize.

There are a handful of metrics that are useful to measure in your project. Please note that you do not have to use all of these metrics - some of them are easier to deploy for individual teams than others - however, all groups should measure specific metrics in all projects. Here I will present my ten most helpful metrics for a DevOps project.

1. Uptime

This is the most vital metric to measure. Without this, you can not infer how reliable the overall system is. The longer you can leave a production system up without rebooting it, the more stable it is said to be.

Availability is a similar metric that can be expressed as uptime in a percentage form: availability = actual uptime / maximum possible uptime * 100%. It also sees usage in the legal department where SLAs have the minimum availability that an organization must contractually give.

2. Deployment frequency

This one is a no-brainer. Faster deployment times mean the DevOps process is working as it should. In addition to the development frequency, the reverse metric of how many deployments a team can make on average per 1/3/6/12 months is also useful to know, as well as how long a deployment takes on average.

3. Number of errors per timeframe

This metric can detect a failure in the system to begin finding the fault that caused it.

There should also be separate metrics for different subsystem (database, HTTP, etc.) errors.

4. Defect Escape Rate

This is the percentage of bugs, faults, vulnerabilities, and performance issues patched before making it to a production system. It indicates how vigilant the development teams are in hunting for flaws.

4. Work in Progress (WiP)

In some methodologies used in DevOps projects such as Kanban, work is divided into batches, and each set has a certain number of lanes which it can progress through depending on the number of teams assigned to the project.

Work in Progress measures how many batches are being worked on at once. You can use this metric to find congestion points in the development pipeline.

6. Build Success Rate

This is the percentage of builds that pass the automated unit tests and Continuous Integration (CI) tests off the shelf. A higher metric means developers spend less time fixing "leaky pipes." Note that this is not a measurement of runtime stability; for that, use the Uptime or Availability metrics.

7. Average Vulnerability Patching Time

While I am skeptical of using the number of vulnerabilities found or patched in a given timeframe as a metric, due to all software, even the most carefully written ones, having security vulnerabilities, including many undiscovered ones, a related metric, the average time to patch a vulnerability, is a significant factor in determining the safety, and stability, of a system. If these values are too high, consider assigning more teams specifically for fixing these types of flaws.

8. Lead time

This is when it takes between starting work on a TODO item until it is deployed and using the Work in Progress metric to identify where the pipeline is stalling.

9. Mean Time To Detection (MTTD) and Mean Time to Recovery (MTTR)

MTTD is the time it takes for staff to detect a failure once it has occurred, while MTRR is the time between the failure happening and Operations resolves it in the production system.

10. Network traffic/CPU/memory/disk usage and number of requests per timeframe

These are standard metrics that measure application efficiency and can detect overloads in the system and apply load balancing accordingly.

These metrics are a must-have even though it's last on the list.


Thanks for reading. Let me know if your organization uses other metrics not mentioned here!

Top comments (1)

Collapse
 
arvindpdmn profile image
Arvind Padmanabhan

Nicely written. Check out devopedia.org/devops-metrics