In 2021, I started wondering how I could measure the overall improvements in performance in the engineering department. We were in the early stages of a program of work called North Star. North Star was all about making engineering capability more efficient and flexible in responding to the business needs.
After researching various options, we decided on the DORA metrics. They provided us with all the necessary insights to track our success, and benchmark ourselves against a definition of good.
What is DORA?
DORA is the acronym for the DevOps Research and Assessment group: they’ve surveyed more than 50,000 technical professionals worldwide to better understand how the technical practices, cultural norms, and management approach affect organisational performance.
(Take a dive into the latest DORA Report and in the book that summarizes the findings: Accelerate).
What are the metrics we are using?
Cycle Time - Time between the first commit on a merge request to master and production deployment
Deployment Frequency - Deployment Frequency helps identify the rate at which you are delivering new business value to your customers. Smaller deployments have less risk of going wrong and provide an opportunity to deliver value to your customers in shorter iterations, allowing you to learn quicker.
Change Failure Rate - For the primary application or service you work on, what percentage of changes to production or released to users result in degraded service (e.g., lead to service impairment or service outage) and subsequently require remediation (e.g., require a hotfix, rollback, fix forward, patch)
Throughput (Detecting Burnout) - Throughput gives us a sense of the team's bandwidth. It gives us a picture into how much work we can typically accomplish. Teams should aim for consistent throughput.
How do we understand the metrics?
Cycle Time - Reducing amount of blockers for developers
Deployment Frequency - Limiting amount of code going to production at once (limited batch size)
Change Failure Rate - Improving quality focus, part of our Continuous Testing Strategy.
Throughput (Detecting Burnout) - Paying back technical debt & introducing automation to reduce toil
Following the rules of lean:
Optimize Work In Progress (Cycle Time)
Value is only released to production, once it leaves the factory floor (Deployment Frequency)
Practice Kaizen (Change Failure Rate)
Invest in SRE/DevOps automation (Throughput (Detecting Burnout))
How are the metrics used internally ?
The dashboard is regularly reviewed by the senior engineering management and are discussed and reviewed in our monthly town hall meeting, and our fortnightly Ops Review. Each team is encouraged to reflect on the metrics as they plan their work, and consider improvements they could introduce.
The metrics also influence the decisions and prioritisation. Just as importantly, they help us to transform our company culture.
In terms of changes measured:
- Cycle Time as we have not measured it before, the main benefit for us is understanding what we need to improve. In 2021 this actually increased to 18.5 days (due to reasons) but we are currently in the area of 8 days on average for 2022.
- Deployment Frequency was improved from once a week to once every 1.4 days (x5 Increase).
- Change Failure Rate was about 8% before we started, it is now oscillating between ~3-4%. (50% Decrease)
- Throughput (Detecting Burnout) as with Cycle Time we have not measured it before, the main benefit for us is understanding what we need to improve. The throughput per developer per week increased by 93% - however we know why this is (new ways of working, additional code bases etc). We are still keeping a close eye on this to ensure it returns to healthy levels, which so far in 2022, it is!
The main cultural changes were:
We have automated the majority of our deployment pipelines (Using GitHub Actions).
We have moved the bulk of our infrastructure management to a standardised Infrastructure as Code (mainly Terraform).
We have improved our Quality Assurance & Testing process.
We hold the ambition to join the elite performing group of organisations as defined by the State of DevOps report. Each day brings us closer to that goal.
What are our future plans?
On the technical side, we are working to improve automation of the CI/CD pipelines, testing process & Observability.
On the DevOps/DORA culture side, we are providing regular talks and training to wider audiences (not only engineering), to establish DORA as a reference point in future product development. We are also making it a key point of our new consolidated engineering strategy.
I’ve found the DORA metrics helped us improve our software development and delivery processes. With these findings, organisations can make informed adjustments in their process workflows, automation, team composition, tools, and more. I recommend you try this in your organisation too.
Further reading:
The Phonenix Project by Gene Kim, Kevin Behr and George Spafford
The Goal: A Process of Ongoing Improvement by Eliyahu Goldratt and Jeff Cox
The Unicorn Project by Gene Kim et al
The DevOps Handbook by Patrick Debois et al
Important 😸
I regularly post useful content related to Azure, DevOps and Engineering on Twitter. You should consider following me on Twitter
Top comments (4)
How do you actually / practically record Cycle Time Andrew? And everything else for that matter? Are you using a spreadsheet or do you have some fancypants (😂) solution in place?
fancypants solution of course 😂
I use usehaystack.io/
Ahhh, very nice. Looks like it's for GitHub only though, or at least it doesn't look as if they have an Azure DevOps integration unless I missed it. Fab name though.
This is a great post. If a project uses Azure DevOps Server managing source code repos and Jenkins for running build pipelines, how can we measure these 4 DORA metrics ? I have not seen an out of box configuration to measure this in such a case.