The DevOps Research and Assessment (DORA) team at Google spent six years researching the key metrics that would indicate the performance of a software development team.
The working group analysed different DevOps implementations. They wanted to establish and understand the most efficient ways of building and delivering software.
The outcome of that research was five key metrics that offer a standardised view that the DevOps team can use to measure success:
- Deployment Frequency
- Lead Time for Changes
- Change Failure Rate
- Time to Restore Service
This article will teach you the five DORA metrics, their benefits, and their challenges. You’ll also learn how to apply best practices when working with the metrics.
The DORA metrics focus on the measurements of DevOps throughput and reliability. Together they can let you and your team understand the reliability of your deployments and how quickly you are shipping code to production. Ideally, the best teams are the ones releasing code frequently while maintaining a low rate of failure.
Deployment frequency looks at how often you deliver new code into production. It’s a simple measurement of the number of deployments that have reached your end users during a certain period.
The objective of DevOps is to increase engineering throughput whilst maintaining high quality. Having a high deployment rate means you are responding to problems more efficiently. When a bug is reported or discovered, you want to send code changes to resolve it quickly.
For organisations having a high deployment frequency can be beneficial. It means you can respond quickly to problems or bugs that arise.
Your deployment frequency will depend on the application you are building. If you have a large monolith application, you may need to take a different approach to deployments than if you are working with APIs or microservices, where deployment is almost always continuous.
The lead time for changes measures the average time required to deliver a revision to your end users. This is the time between the code being committed and deployed to production.
You can use your version control and Continuous Integration (CI) software to measure the lead time. To do this, take note of the time the code was committed. Then note the deployment time; the difference between the two times gives you the change’s lead time.
Not all changes are the same. And the scope of the work involved for specific changes can affect the change’s lead time. This is why teams are encouraged to break changes into smaller tasks by following the DevOps principle.
Keeping each item to a similar size helps to increase this metric’s reliability. If you occasionally have larger changes within your rotation, it can skew the metric and make it unreliable.
The change failure rate metric measures the percentage of production deployments that have caused an incident. This metric can be a good indicator of the quality of your software.
If teams have a high change failure rate, they frequently ship broken or unreliable code. Customers will only tolerate this for a short time and could drive them away from the product.
To calculate the change failure rate for your team, you need to collect two values:
- The number of production deployments started within a certain period
- The number of deployments that caused a bug, outage, or performance issues
Any failures must be accurately marked against the deployments that caused them.
This metric is related to how effectively your team responds to service outages. It’s a generally accepted concept that the change failure rate will never be 0%. All services will encounter an issue at some point.
The mean time to recover metric measures the time that elapses between an incident starting and normal service resuming.
Your incident management system can help you collect the data required for this metric. Your incident management system should record when the incident is reported and record a service restoration time.
There is no denying that downtime is expensive. In 2016 the Poneman Institute carried out a study that said the average cost of downtime is $9,000 per minute. So being able to recover quickly from an incident is desirable.
The fifth metric was added in 2021; this metric is used to assess operational performance. It helps you to measure the success of your operational processes, or not. Reliability is a broad measure that includes things such as availability, latency, performance and scalability.
If a DevOps team is not collecting data, it can be hard to measure performance. It is also hard to understand where improvements can be made. Using DORA metrics, a team can break down the abstract layers of DevOps and show engineering leaders what areas can be improved upon.
Using the information from the DORA metrics means teams and their leaders know what areas to improve upon. A streamlined and efficient software development process will benefit the organisation overall.
When people or teams are measured, they can feel responsible for the metric. This can skew the results in specific scenarios, but ultimately it can help to eradicate inefficient processes.
While DORA metrics are a great way to measure DevOps teams, using the metrics can bring its own challenges.
The five DORA metrics can be a great starting point for organisations, but there is no one size fits all to measure performance. Teams and applications will differ, so applying the metrics without some customisation are not always practical.
Collecting data to measure the DORA metrics correctly may take work. Data may be incorrectly labelled, especially when we look at the mean time to recover metric. Outage times may not be correctly recorded, or the outage cause may not be correctly identified. Again this is something that leadership needs to view and understand, so they are not skewing metrics incorrectly.
Your teams' maturity and DevOps practices also need to be considered. If you only have one person and don’t have a CI/CD tool in place, then it might not be the best use of time or metrics to implement DORA.
Whenever you measure something, you can end up obsessing. If your team is constantly getting code into production and there are new significant issues on deployment, obsessing over the metrics isn’t helpful. It will probably be detrimental. Stay on top of the metrics, but don’t obsess about them.
Once you implement DORA metrics into your team’s processes, you should continue to review them. They aren’t a measure once and forget. Be willing to learn and improve from the insights the data gives you. Over time the hope is you’ll see gradual improvements in all four areas that the metrics cover.
Several of the metrics rely on data being captured. Ensure your systems or others within the IT department can capture the data you need. For example, to measure the lead time for changes, you must ensure your source control system captures the time that code is committed.
Look into tooling that can help you measure the metrics. Tools like Sleuth have been engineered to help teams track and improve their processes using the DORA metrics. It is also worth investigating if your existing tooling has support for surfacing DORA metrics. GitLab has an API that can be used to extract DORA metric information.
The five DORA metrics are standardised measurements that can reveal how quickly you are iterating. The quality of your deployments and how well you are responding to outages.
Teams and engineering leads who regularly review their DORA metrics can make informed decisions on improving performance. And after changes have been made, they can use the metrics again to see if things have improved or if further adjustments are required.