Warren Parad for Standup & Prosper

Posted on Sep 5, 2022 • Originally published at dev.to on Jun 28, 2022

Measuring team success

#leadership #agile #highperformanceteam #teamperformance

It should be a inevitable conclusion that you to need to measure the success of your teams. After all, we get what we measure. And having successful teams sounds like a good thing.

Although what do we mean by a successful team? Do we mean high-performing? Do we mean happy? We can’t measure a team by a metric to find out of if they are successful, that’s a catch-22, we need to start with what it means that a team is successful, and then we can create metrics to match that.

You may be able to fill in the blanks a bit, but it’s worth diving into, what is a successful team?

An ability to learn and adapt as the business and technology change
New features can be delivered quickly
Low number of bugs reported from users in production
Team members are motivated and potentially even happy
When there is a problem the team steps up to deliver a fix
The team is a leader in their domain, they innovate
New team members onboard quickly and effectively
Low turn over rate for team members
[Bonus] They work on the right things at the right time

Have something else, add it to the list, what’s important is clearly state what you think a successful team means by looking at outcomes. Don’t say something dumb like “prioritizes backlog items”, “has retros”, “estimates correctly”, or “does sprints”. These are the activity, instead focus on outcomes => “delivers meaningful business impact”, “team morale improves over time”, “delivers frequently”, or “applies the right structure at the right time”.

The worst metrics

Now that we have a shared understanding of what a successful team looks like, what metrics should we use to capture this? The right metrics should be focused on only the things we said, and the explicitly exclude anything that isn’t related. The critical fault would be having a metric that the team can’t control the outcome. Here I’ll start with what metrics should never ever be used, and why.

Objects & Key Results (also known as OKRs) tracking and key results completion
Sprint goal completion
Velocity — Number tickets or story points completed
Lines of code code committed

Why?

I tend not use sprint goals nor OKRs. While I think OKRs are fantastic for driving alignment, transparency, and correct prioritization among teams and the business, their “delivery/status/value” is coupled to the two things that are completly out of the teams control:

The correctness of the OKRs
Dependencies outside the team

We actually don’t want OKRs to be at a 100% nor do we want them to be at 10%. OKRs success is our ability to drive motivation by trying to get to 100%. If we never get to 50% then we know we critical overestimate our abilities. If we get to a 100% before the quarter is over, we’ve underestimated ourselves. The next OKRs adjust for this, which means we will never be successful by an OKR metric.

Additionally OKRs exist outside of one team, they involve deliveries and collaboration with other teams and users/customers. And also are evaluated based on changes in Product visions and Business strategies. You cannot under any circumstance measure success of a team by something they can’t control. That’s just unfair, even the perfect team can’t sustain problems caused by faulty expectations on metrics. One extreme example that proves that point is a critical pivot that invalidates an OKR would means that team failed, without room for discussion.

Further, successfully matching the Objective’s KR, tells us only about the ability to predict the business success than actually measuring if the team is responsible for it. If we set a goal and we hit it, that only means that we are great at setting goals, not at hitting them. If you want to run an unhappy team, keep setting bad goals. The OKRs won’t be hit, and your outcome will be blaming the team. Team had nothing to do with it.

Sprint goals and velocity go the same way. They are great for alignment, but they are absolutely terrible for determining success. Again, because both of them look at estimating what you will do before hand. We couple our ability to estimate to our definition of success. A team being able to effectively estimate their work is useless for determining success. And a real successful team doesn’t need to estimate. Even if estimates had a value, we are going to get them wrong, by a lot. And most importantly meeting these goals just means we estimated well, not that the team is performing or successful. If we want to call a team that estimates well — successful, go ahead, but that’s the only thing they’ll be good at, and we know in the software this has very little value.

And I hope that I don’t need to get into Lines Of Code or anything that is hands on technical related, these don’t tell us anything about the value of team delivery, just the raw output. Don’t use output, use outcomes.

One more problem!

Not only does estimation contaminate the ability to use OKRs, there’s a second issue. Since you’ll be evaluating team success and performance based on this, your teams will by design interfere with the process. Your teams are already deciding what the OKRs will be, and if they are to be judged on them as well, then you’ve set yourself up for a critical failure mode. The reinforcing system design problem is then building OKRs to make the team look good or bad. If you can intentionally make a team look good or bad by a process then there is something wrong, it should be as objective as possible.

As a result, a common organizational antipattern is to take the team being judged out of the process and assign them sprint goals and OKRs. Doing so renders the OKRs worthless, since the whole point is alignment; assigning them prevents teams from being on-board. Further, itprevents them from growing to the point where they know what makes a good OKR in the first place.

Usually this happens in places where there is a lack of understanding of what effective teams look like. And OKRs that not sourced from the team, created by the team, measured and record by the team, is indicative of toxic leadership.

So what are the right metrics

We’ve uncovered an important criterion in the wrong metrics listed above, and that’s anything related to estimation and planning. We should never measure a team on estimation or following a plan, but instead on outcomes. Since it is really hard to measure “successful customers” or “value delivered” we need proxies for this. Some companies like to use “revenue”, but that’s conflated with the business and Product-Market-Fit which may be non-existent, or the company is accidentally successful, like most out there.

When I started engineering a long time ago, there was no consensus on what was important, we could talk about it, and get pretty far, but we were only touching on the boundaries of what that is.

We knew it had to do something like speed of delivery, not wasting time, and “quality”. We also knew concepts from the Toyota Way, would help introduce Lean to software in a good way. But talking about removing wastes is different from measuring success. Sure we want the wastes to be zero in the team, and we see how things like hand-offs are wrong, because they create transportation waste. Or doing work that isn’t delivery directly to users is wrong, because we stock on feature inventory waste.

Now that we have some understanding of what makes good for measuring team success and what’s atrocious we can land on the following (also known as DORA metrics, and heavily discussed in the book Accelerate):

Lead time for changes— How long it takes the team to start working on something after they know they want to work on it.
Deployment frequency — How often a team deploys to production, visible changes for users, getting those changes out, not just merging.
Change failure rate — How often after a deployment there is a new bug.
Mean time to recovery — When there is a bug how long does it take to recover.

And that’s basically it. We know these are also the right metrics, because they are measurable, and they don’t represent bringing another process, which in itself could be flawed. There is zero estimation here, other than the cases where we don’t have precise measurements for them.

And that isn’t to say that you have to perfectly measure these, just that you should be thinking about these when measuring the success of a team.

[Bonus] The advanced metrics

Those are a great start, and I actually don’t stop there, there are two other metrics I like to add to this list for my teams, which prevent other standard team disfunctions. One common disfunction pattern is when a team is highly praised for always working on the right thing and all their customers are happy, but every team that works with them hates the way they work (i.e. the output of their work, not the team members themselves). Other teams in the same org could even think that team is doing a good job, and that’s because the team is spending all their time fighting fires.

Long term any team that spends an increasing amount of time fighting fires is creating a run-away problem that will eventually cripple a company. If more and more time has to be spent on resolving issues with already created solutions, then you will run out of time. And usually teams realize this already when they have no time left. That’s why one of the most important metrics I focus on is:

1. Percent time spent on fighting fires

This also solves the problem of having great values for the above metrics, but your organization thinks the team never works on anything valuable. Teams that are quick to release, but have a poor design process, tend to have high positive optics in an org but usually end up being a critical bottleneck. This is a frequent problem pattern.

The other angle from this is value delivery. So far most of the metrics are internal facing, with good reason, external facing metrics are contaminated with problems like is the product the right one , do we have the right users, does the business strategy make sense? If we have problems in one of these areas we can’t let those affect our judgment of the team’s success. Nonetheless, I feel the need to have a metric that tracks if the team’s work has value. Because, if a team isn’t completing valuable work, we’ll know they won’t be happy. So, my bonus second metric is:

2. The number of support/user/customer requests

Along with this comes the accuracy and time to answer, along with the number of people required to answer. We want a level distribution across the whole team, quick responses, with low involvement. Any team member can and should answer, and they should be quick and obvious answers.

We can use this as a driver to make sure our documentation (from external facing docs, to internal ones, and api specifications) makes sense and is relevant. Using this metric tells us what we are building, or how we are thinking about it isn’t right. User Confusion is a real problem, and usually the source is that instead of the team owning the product, the product is owned by someone outside the team. Ownership may look like a Product Manager that is incorrectly focused on features rather than product vision, or a Business Analysts focused on customer insights rather than capturing the potential value for the business. Product ownership is aligned at the team level, no where else.

Tracking these over time are ways I know that the team is spending their work in the right area. There are some longer running aspects such as engineering retention which can be hard to link to a specific team, and also team member growth within the team. But for hard metrics, I usually stop here, but these other aspects are great indicators, so let’s talk about them as well.

Leading indicators

That wraps up the quantitative objective metrics. However, it can be a challenge to jump in and introduce these and an obvious followup is Do we need to track them? The answer is almost always, a resounding NO. It is almost always enough to talk about them, without actually putting up numbers. There is no small number of SaaS products attempting to track them for you, and some graphs look nice, and you may be tempted to set them up and share them externally to prove that your team is successful. However, doing the tracking these metrics successfully, is actually a metric in itself, and not one that is listed above. Being a successful team is very different from proving it, and unless there is a concrete problem you want to fix, unnecessarily tracking metrics to prove the team is successful is another indicator of toxic leadership.

So, if we don’t actually track these, what are some good indicators for knowing that the right things are happening? You can start by looking with your eyes, it isn’t hard to know the direction to make things better, but there are also obvious signals that can be utilized:

Are engineers growing/being promoted (obviously with the caveat of being promoted for the right reasons.) Here I use a “gut check” based on who is attending/meeting/sharing/talking/moderating/leading. If it is always the team lead, then I know there is a problem. If other people are taking initiative, that’s a good sign.

Are the teams innovating either from pulling in/testing new technology or leading/experimenting with new product ideas. These are more subjective, but having a feel is a requirement to being able to judge the success of a team. If you don’t know what a team innovating looks like, then you probably aren’t the right person to be judging if a team is successful.

Free capacity is the ability to burn down tech debt at the right time, as well as working on “important things”. If teams are moving slower than new features are needed, then their process isn’t scalable. Since I expect software teams to automate their tasks, not just deliver, their responsibility is to put themselves out of a job. With goal of working on more important things over time.

If they don’t have the ability to turn around new features or prioritize them, we are either making too big changes and not being agile, or have problems with our previous work. If previous work is getting in our way, it tells us the teams aren’t solving problems in a way that matches the expectations of the business or how the product is conceived/how our users think.

For additional indicators, the Spotify Health Check is a great place to start, with the caveat, obviously you need to adjust for the specific culture you want to cultivate.

Wrapping up

Absolutely don’t use metrics that they themselves are flawed in creation. If we depend on measuring success by a flawed mechanism (estimation for instance), then we couple the success of a team to a meaningless and often wrong metric. You won’t get a successful team, you won’t actually know if your team is successful, and worst of all worlds, no one will be happy.

DEV Community