Accelerate's "Software Delivery Performance" model & DORA metrics

#leadership #programming #productivity

Welcome back, in the previous article we talked about why Accelerate's research matter and why we should care about it, and now it's time to dig into its Software Delivery Performance model.

Measuring software delivery performance has been tried before, but has failed by lacking statistical validity and by measuring simple outputs instead of globally positive outcomes. Some examples of bad ways of measuring are:

Lines of code
This is a simple productivity-focused metric, but there's nothing inherently good about few or many lines. Definitely don't do this. 
Velocity (e.g. story points) 
Velocity is relative and team-dependent so is not a good candidate for globally measuring performance. It's also easily gamed by developers who want to balloon their points. This is not a good software delivery performance metric. 
Utilization (e.g. how busy are people) 
This isn't just a poor measurement because being busy doesn't equal good outcomes, but also high utilization is bad, it actually slows down work. When everyone are very busy there's no spare capacity left to deal with unplanned work. Maybe this topic deserves its own dedicated article 🤔, but certainly utilization should not be used as a performance metric.

With those techniques discarded let's dive into what does work, shall we?

Through careful analysis and hypothesizing the research managed to evolve and validate four key performance metrics that provably informs the software delivery performance of a team. That is to say, these metrics are linked to positive business outcomes and are robust across companies and contexts. These are also known as the DORA metrics, and they are:

1. Delivery Lead Time
2. Deployment Frequency
3. Mean Time to Restore (MTTR)
4. Change Fail Percentage

Great. What does that even mean?

Each of those metrics have specific definitions, and a team is categorized as either Low, Medium, High, or Elite performers depending on how they rate across all four. It's fascinating that the research can be so hugely complex and involved and amidst all that data they manage to find a signal from just four simple metrics.

Delivery Lead Time

This is how long it takes from code committed until it runs in production. Note the code committed part, this metric isn't about tracking full design lead times (e.g. from a ticket created until it's done) because that is a very fuzzy problem of when to start and stop the clock.

The possible answers are:

less than one hour
less than one day
between one day and one week
between one week and one month
between one month and six months
more than six months

Deployment Frequency

This is the number of deployments of a team's primary service to production. It is a proxy for delivering small batches of work because teams with high deployment frequency pushes many more but individually smaller changes to production.

The possible answers are:

on demand (multiple deploys per day)
between once per hour and once per day
between once per day and once per week
between once per week and once per month
between once per month and once every six months
fewer than once every six months

Mean Time to Restore (MTTR)

When a service experiences outages or impairments, how does long does it generally take to fix it? This is a measure of reliability.

The possible answers are:

less than one hour
less than one day
between one day and one week
between one week and one month
between one month and six months
more than six months

Change Fail Percentage

How often does it go wrong when a change is applied to the system? That is, for all the code changes, releases, infrastructure changes, configuration changes, etc., how many result in outages or impairments that require a hot fix, rollback, or any other form of fixing?

The answer is the ratio as percentage.

And that's it. Delivery Lead Time and Deployment Frequency measures speed, and MTTR and Change Fail Percentage measures quality, and together they form the Software Delivery Performance model.

How do you compare?

As of this writing the latest report is from 2021, and it specifies these four performance categories:

Metric	Elite	High	Medium	Low
Delivery Lead Time	Less than one hour	Between one day and one week	Between one month and six months	More than six months
Deployment Frequency	On-demand (multiple deploys per day)	Between once per week and once per month	Between once per month and once every 6 months	Fewer than once per six months
Mean Time to Restore	Less than one hour	Less than one day	Between one day and one week	More than six months
Change Failure Percentage	0%-15%	16%-30%	16%-30%	16%-30%

With diligence and effort I'd say all of us can push towards at least High.

There's also a great insight hidden in these numbers, one that can be a difficult pill to swallow for some: This chart shows, undeniably and unequivocally, that it's possible to reach high performance and high quality, at the same time. There is no tradeoff between speed and quality. Time and time again I hear the argument to move fast even if it costs in quality, but the data shows this is a false dichotomy. It's a "pissing our pants to keep warm"-argument (or in more polite company: "go running without tying your shoes"-argument), that might feel good in the moment but is ultimately not a strategy for success.

Performance and quality go together. Do not sacrifice one over the other.

At this point we've pretty well covered the basic introduction to the wonderful research from Accelerate, and how we can use its Software Delivery Performance model to compare ourselves to the industry.

I'm not entirely sure its worth diving much deeper because if you're keen to hear more at this point it's probably easier for both of us you just read the book 😅 But I'm happy to hear your thoughts in the comments.

Photo by Fleur on Unsplash