Warren Parad

Posted on Jul 10, 2023 • Edited on Oct 30, 2023

The Devastating Failure of Technical Leadership

#architecture #leadership #career #discuss

I'm going to tell you a story, one that may even have happened to you. The sad truth is this a very common story, and it starts with...

🎉 Celebration time 🎉

It's time to celebrate, or at least that is what everyone is telling you. See, just recently an engineer on another team (Team Visible) that you haven't spent too much time interacting with was just promoted. To Lead Architect or Principal Engineer, or another title that wasn't ever completely explained to you, but that's another matter.

The promotion apparently had been long earned and recently Team Visible had just finished a years long project to completely rewrite a bunch of legacy technology they had. It isn't clear where the legacy technology came from, but in the past, the few times you had to collaborate with the team or integrate with the services they offered, it was a huge pain. At the time, it seemed like that team wasn't very good--how come it was always so difficult? Now it is apparent it was the result of their legacy technology.

Luckily, your organization and the larger company use microservices, so their changes were able to be deployed without you or your team (Team Hard-Working) even learning about it. And because Team Visible worked so hard to make changes for so long and finally accomplished them, it was time to celebrate those accomplishments.

🤩 Promotions 🤩

Often promotions are treated as a reward for "hard work". In the world of software engineering, that is usually attached to "finishing something". You completed a project or two successfully, and now you get to be promoted.

However, that is always wrong. It's wrong because you've skipped step. You jumped from committing code to declaring success. Frequently, poor leadership is associated with this pattern.

See when you promote someone on the release of a project, you've created a feedback loop that reinforces releasing of projects. This may seem like a good thing and many companies work like this. The result however is lots of releases and very little success. That's because before the success of a project is evaluated you skip right back to the beginning to start over again.

Another failure mode is tracking the wrong metric. A common example is Service Usage. Let's call this project Service X. If Service X has lots of usage, it's a success, and low usage, then it's a failure. Well, that's also a problem. Because you've coupled usage to value. Using a service doesn't mean it is providing value. And more usage doesn't equate to it doing a good job. Worse, rewarding usage of the service, means encouraging and sometimes forcing migration to the service even when it actively hinders future success. (We'll see more about why later)

The Relevance

Now we know that when the Principal Engineer was promoted, it may have happened without ever evaluating what the impact of their work. And since this becomes a reinforcement loop, we can easily see that people who release faster without any regard for evidence of success will be promoted sooner. This not only encourages, but rewards projects that actually have no or negative value. That's because you don't get promoted for delivering value, and worse you don't get promoted for not doing something.

That means the only things that get done are random and have no association with value. It's actually worse, in reality, the have negative ROI, and it's simple to explain. The projects that are worked on are ones that are projected to have higher ROI. However, usually the projects that have higher ROI are the ones with incorrectly estimated returned value or incorrectly estimated effort. People tend to overestimate the value and underestimate the effort, that means the projects that are worked on we know are always the wrong ones. We can predict for organizations that measure ROI and promote people based on it, will frequently be encouraging the creation of negative value.

And you know what happens next...

That Lead Architect needs a bigger scope and a larger more visible project. Team Visible isn't visible enough or doing important enough work for them. Because they've been promoted for finally completing what they worked on, and they of course need something that matches their new level.

We can immediately see that more and more people doing this, has a devastating impact. And worse, even if an organization knows there is a problem with the success of the project, that evidence won't be available until long after that engineer has left for a higher paying job somewhere else. Who doesn't want to hire Principal Engineer that just completed a massive project with tons of moving parts!

That's right, by design people will:

Work on a project
Complete the project
Be promoted
Leave
Deliver negative value
Cause pain to everyone else and the business

And will happen so long as you promote before collecting multi-year evidence of project success.

💵 Show me the money! 💵

Finally the impact of the project has reached your team Team Hard-Working. All that hard work that another team did without ever involving your team now results in your "opportunity" to make changes.

I put "opportunity" in quotes, because everything you had was already working correctly for your users, clients, and customers. You've worked around the hacky existing solution that Team Visible managed. But now, you've been tasked with converting from what you have today to that new thing. See what happened was that project Team Visible was working on was to replace the terrible legacy solution they currently had. In your company of 5+ teams, everyone else depended on it. Everyone knew it was terrible, but at this point everyone had successful worked around the problems.

In this world we can see there was near zero value for replacing it. The old adage is "don't fix what ain't broke". But the reality is more complicated. We need to include the future "present value" of making a change. This is something no one calculated. We can easily see having a better version of X (X-prime), is better. But is the cost of having X-prime worth the value? This known as ROI. If everyone is already dealing with X and X works, and in the next 10 years no one really cares about X changing, then X-Prime is a waste of time and resources.

And worse, because Team Visible changed X, now it's on your team to depend on X-Prime as well.

🎁 The Gift that keeps on giving 🎁

You start to work with the new version of X-Prime, and see that everything is different, in ways that don't even really make sense to your team. Interfaces are all different, endpoints work differently, and even the names of old properties have been renamed.

There's no conversion manual because The new version is better says the Lead Architect, and It will be easy to migrate to it.

But how long does it take?

When you were planning work for you team for the week, month, quarter, or even year, did it include the extra work that you would need to do to migrate to X-Prime?

How could you, you didn't even know that Team Visible was going to whip it out of their ****. While Team Visible was enjoying all the glory of delivering something new, having wasted months or even years developing it, now that they are done, they've resorted to pushing all the actually hard work of migration to the other teams.

But the migration is easy!

Okay, maybe...but let's talk about prioritization. Your Team Hard-Working didn't need this, actually it is the last thing you needed, because you've spent the last 6 months dealing with tons of problems caused by a recent feature that Team Visible released and your customers actually needed. So Team Hard-Working has been hard at work making that value a reality.

You've got a huge backlog of your own technical excellence work that needs to get done to ensure that you don't have tons of on-call production alerts. Because, since you lead such a great team, you usually deliver the value first, and then optimize afterwords. That tends to mean a concerted effort after delivery to improve the troublesome feature you just released quickly.

But instead of working on improving your services or getting to your critical backlog to solve your committed OKRs, you now have to prioritize Team Visible's Service X-Prime. So what do you push?

...

Wrong answer. See, no matter what you decide to not do--to make room for the migration, it's going to be your fault. You have to drop something that you want to do, even have to do, to do something that you don't need to nor want to.

You'll be blamed for it, but in reality that's not your fault.

It's Team Visible's fault...Well actually it its their director's fault. Because they:

Promoted and rewarded the team and team members that created more work for others
They, also in a backroom somewhere, told your director that Team Visible was complaining X-Prime wasn't being used yet, so now your director wants to know why.

Even if you don't do the work, now you are having meetings to discuss why you don't want to do the work. And that's because there's a graph somewhere with a list of all the teams using X-Prime, and your team isn't on it. It's right next to the graph of X-Prime usage, which is just going up!

Quick Aside Measuring usage is worse than the wrong metric it's a bad one too. Let's take for instance "Length of time a user spends looking at their email". Someone might say "Increased time is great, that means they are doing more with our email client...Success." But the chilling reality is quite different. The truth is, the longer someone spends in your UI the worse your product is. Using an email client longer doesn't mean that they are getting more value, it means they are struggling more to get done what they needed to do before. So don't measure usage, measure value. A much better metric that "length of time using UI" is number of emails sent.

The same goes for using Service X-Prime, Usage is bad, Value is good.

Honestly, F&#@! Team Visible.

Bringing this back to reality.

In today's world, this looks like "We just migrated all our teams to Kubernetes, it was a huge project, and I made possible." Before K8s, it was "Now we are using Kafka". Frequently, we hear of these giant undertakings to completely rework some core piece of architecture and infrastructure utilized by the engineering teams (and usually done by one or two engineers given decision rights without accountability to get that done.)

When you are hiring did you ask, how long ago was that? As in, how long did you wait until after the project was over to collect enough evidence of the business impact and the impact to other engineering teams? Because answering this question is difficult even when you have taken the time to do so. The answer is almost always: "Everyone was really happy since the last 2 months that they've switched". Which doesn't include a long enough time horizon nor concrete metrics. Things like bugs tracked, support tickets, time to delivery, the list goes on (You are probably thinking DORA or SPACE here). There are actually metrics for the success of the team, and someone saying "everyone is happy...right now" is not really one of those metrics. Happiness of the team is critical, but if you want to know what good metrics actually look like, I've written an extended article on Measuring Team Success.

Well that sucks

It would be nice if you were plugged into everything. The moment Team Visible said "Let's replace X"--having heard that, your Idiot-dar would have gone off and you could rush to save the situation and prevent pain for the next months or years.

But that just isn't how it works, everyone* does work, all the time. You can't be part of every conversation and every thing being built. And you really don't want to be.

There is something that we can do here though. That's make sure that we have the conversations that couples Promotions to Value. And conversely, couples Failure to Accountability. We never took away the promotion given to the Principal Engineer when we learned teams were struggling, instead we assumed the problem was with Team Hard-Working, they just must not have been working hard enough. And honestly who is going to call up that extremely large company where Principal Engineer is now working to complain to their manager about it.

Unfortunately, there's no silver bullet here, and I know at the end of this article, if you are still reading, were hoping that I could give you some magic advice that helps you avoid this situation. The truth is, there is no fix for the Technical Leadership Problem you have. That's fundamentally it: You have a leadership problem. Your leadership promotes people that destroy productivity and creates adversarial working environments. Instead of focusing on the value to the business, they focus on visibility of themselves through their teams. I mean, come on, who names their teams Team Visible, such a terrible name.

One last thing

I will say migrations can absolutely be easy. Not all new things go the wrong way, so take some time to actually evaluate if that new thing is actually worse. Maybe it makes it easier for you to do your new work in the future, and it's not just busy work preventing you from what you need to do.

That's actually why after working for many companies with terrible Authentication and Authorization solutions, my company went out and built Authress. Having been in the market for over 4 years as one of the most mature solutions in the space, we've spent a ton of time understanding where teams pain points are when it comes managing application security. Most importantly, we know how terrible it is to build your own authorization. If that's interesting for you, I'd love to have you join our Community to discuss application security.

DEV Community