DEV Community

Cover image for A Brief Guide to DevOps Metrics
Geri Máté for dyrector.io

Posted on • Originally published at blog.dyrector.io

A Brief Guide to DevOps Metrics

It’s impossible to improve any system without measuring its performance. Gaining insight through relevant, reliable and measurable DevOps metrics is the only way organizations can take their value centred software development game to the next level.


Measuring DevOps performance is easy. Figuring out what metrics to look at is the difficult part. There are so many details teams can pay attention to but it’s easy to get lost between them or rely on data that can be cheated. An honest approach to DevOps metrics is the key towards improving the already configured systems and processes.

Why are DevOps metrics important?

DevOps by nature is constant improvement towards simplifying processes by increasing collaboration between development and operations. Automations are powerful tools in the hands of DevOps teams which enables logging and tracking selected metrics. Executing these improvements isn’t possible without consciously keeping track of performance. Picking the right DevOps metrics can help you identify steps that don’t bring value to you. Measuring performance holds benefits in a couple of areas:

- Governance – For your team, DevOps is what you make it be. But that doesn’t mean you won’t need standardized internal processes to keep your processes secure. DevOps metrics can help you gain insight to tweak processes to your needs.

- Collaboration – DevOps brings together departments with conflicting goals. While development’s definition of success is bringing change to the product, operation is deemed efficient when the product is stable. They need common KPIs to work together as a team, instead of striving for goals that negate each other.

- Finances – DevOps done right is a cost-effective substitute for traditional IT practices. There’s still a possibility of overspending on tools, staff and providers that enable provisions on the cloud. Measuring performance can give your organization an idea on optimal spending.

Traits of Right DevOps Metrics

The most significant expectation from tracking DevOps performance is that it shouldn’t be an engine of dysfunctionality and tension between teammates. Remember, both development and operation work towards the same goal: bringing value to the customer. Metrics should help them finding out how to do that efficiently. There are basic requirements of metrics to find out how to work efficiently.

- Relevant – Not all DevOps metrics are equally important to different organizations. Relevance differs based on whether the software is a B2B or a B2C application, for example.

- Reliable – DevOps metrics are supposed to be reflecting reality, independent of the human factor. If your teammates can impact the metrics, they shouldn’t be considered.

- Measurable – DevOps already utilises automations. It’s appropriate to include measurements into automations to get constant and immediate feedback on performance.

3 Critical DevOps Metrics

Based on these traits, it makes sense to define co-dependent metrics. After all, our goal is to keep track of the performance of teams that have naturally different definitions of done and success.

- Lead time for changes – This metric reflects on how fast the code moves from a side-branch to a deployable state. Normally, before reaching a deployable quality, the code runs through a bunch of tests to make sure it won’t break the application. Lead time for changes measures how much time it takes to run these tests. This rate should rather be counted by hours instead of days or weeks.

- Change fail rate – Although the new piece of code is observed by automated tests, there’s a chance the change in the application causes downtime in production for various reasons. Shift right testing approach intends to avoid this from occurring but there’s no silver bullet solution to completely avoid change failures. It makes sense to track how often deployments to production cause downtime, not counting pushing the software to testing and staging environments. Ideally, the number of change fail rate should be below 15%.

- Mean time to recovery – Outages present the greatest threat to customer satisfaction. Therefore, measuring how much time it takes to restore service is one of the critical metrics teams should pay attention to. This can be significantly reduced by using tools to roll back the failed version from production and instead give access to the last used, properly functioning version while the problem isn’t solved. Downtime should never be longer than an hour.

How to measure?

As of now, there’s no industry standard to measure DevOps performance. Developers can create the logs required to measure DevOps performance but there are open-source tools they can simply integrate to your organization’s development lifecycle and keep track of any metric you think is important. From our experience combining Prometheus and Grafana is a viable way to get it done.

While Prometheus monitors application status from any aspect, Grafana can visualise the gathered data from Prometheus, making the log more transparent for non-technical stakeholders and decision makers.

Summary

Your team is already automating as many steps as possible to exclude human factor. It makes a lot of sense to measure the automations’ performance to see what else needs to be improved. Before picking the right metrics, your team should define what they plan to do with the insight they’ll gain from them. And then extend the automations by creating detailed logs to understand your system’s efficiency and how to improve it. These DevOps metrics will certainly help you with it.


This blogpost was written by the team of dyrector.io. dyrector.io is an open-source container management platform.

Learn more about the project on GitHub.

Top comments (0)