Nick Cinger

Posted on Oct 28, 2019

Surviving the Legacy

#refactoring #programming #management

Legacy code. Just reading the words out loud will make developers wince. Partially 'cause of their own horrid experiences, but also because of an ongoing meme that Legacy code is literally the worst.

This meme has origins in reality. Hard to debug and dificult to rewrite code is nobody's dream. The dream^tm is building a brand new, fresh out of the oven, no-prior-baggage project in which the code editor is your oyster.

The dream is definitely not "going down a decade old rabbit-hole of debugging". The dream might be "working on a new feature without having to think about how it's going to break another part of the system".

Alas, you don't always get to pick. You work with the cards you're dealt, and folding your hand is a tempting option. I'm going to make the case for staying in the game even if it feels hopeless at times.

But first, let's paint a picture of a Legacy code-base.

The background

Just in case you're not currently working with a monstrosity, let's give you a background of an "imaginary" project. The project is 10 years of tech debt. It had started as a mish-mash of code from several developers, working with no clear guide or vision.

The higher ups, thinking that 5 average developers are better than 2 great ones, had thought that they were a better cost-to-code ratio. None of the developers had the necessary experience to see where the project was going. They were busy working away on their own little corner of it, not paying attention to the bigger picture.

So you have several contributors to the code and no style guide. You also have no oversight, no code reviews and obviously no unit tests nor documentation. But hey, they were churning away, knocking down requirements one by one.

Since there wasn't a proper release or testing process involved, there were bugs. God there were bugs. And then the merge conflicts came. And so the developers started dropping off, replaced by other new ones, adding more to the confusion.

A few years into the project money was growing tight as the product was sub-par in an overcrowded market. Features were coming out too slow and they were too buggy to keep users interested. As a result, deadlines became tighter and code kept getting worse.

But somehow, through it all, the project turned a profit a few years in. It had users and money coming in. So now stability of the project is a lot more important than it was while it was still just under development.

There is still no documentation. The best you could hope for are random code comments and the outdated knowledge base. Unit tests have been poorly implemented. They run too slow and developers have to run them manually. The result: nobody runs the tests.

A few more years, the project is actually making bank. The code is officially "Legacy" and is a proper pain to work with. But the business can't sit still, it needs features and it needs more developers. Gradually the consequence of the aged code-base become evident.

The consequences

You're Johnny Newdev. You've been hired to supplement the developer team and help them fix bugs and add features. The hope is that you'll be doing more of the latter, but as time goes by you see it's more of the former...

It took the team lead a week to onboard you. The team is already swamped and now they have to spend time setting you up. Unfortunately they haven't done this in a while and the process is not documented. You're given this Disk Image of what seems to be everything and the kitchen sync of the project, but even that doesn't work. You have the code, a virtual machine and a hastily prepared development database.

The first few days are spent debugging network and database issues. The rest are you adding the undocumented dependencies to your system. On day 5 you can finally load the website locally without any (major) errors.

Parts of the system still don't work, but you're told not to worry about it. "We'll set you up once you need to work on those bits" is what you're told. Of course those are the bits that actually need to be worked on, so you spend a few more days setting those up.

Finally you're fixing your first bug. The lead developer is confident that there's no way you can muck it up, seeing as it's an isolated part of the system.

Your code is online. Half an hour later the team finds out the signup form is broken. Whatever you did, it worked locally, but the code behaved differently on the server. You go into firefighting mode and fix it on the server to not waste time with the slow deployment. Yes, for some reason you have root access to the server.

So now that it's fixed, you're terrified of changing anything else. You triple check your code and keep a very close eye on Production once the code is online. This also means you're behind on your deadlines which in turn means you're rushing stuff out, which means you're doing even more firefighting. It becomes an endless, but expected, cycle.

You think back to this game you played a while back, called Dwarf Fortress. In the game you manage an expedition of dwarfs, mining through the mountains to build out a fortress and accumulate fortune. The game has it's own definition of FUN. One of the most common ways to "have FUN" is the so-called "Tantrum spiral".

It starts with one of your dwarfs getting into a foul mood for whatever reason. Maybe they saw a dead animal, or a piece of art rubs them the wrong way. They now proceed to punch the dwarf nearest to them out of frustration. So now you have two pissed off dwarfs and they spread the mood further. Two become four, four become eight and eventually your entire fort grinds to a halt. Everyone is too exhausted or angry to work and the lack of beer is not helping matters. And then the goblins come - your adventure ends.

You're seeing the Tantrum Spiral play out before your eyes. Releases are slow and buggy, which means the users are unhappy. Money starts to dry up so management gets pissed. Developers are put under pressure to produce more and produce it faster. The result being that team produces buggier and less stable features, which in turn angers users, which angers management again, which fires a developer in attempts to "fix" the new budget problems. This just makes things worse and the spiral downwards continues.

Damn, just writing that out is giving me anxiety.

The problems are obvious by now and the team is brainstorming solutions.

The "not actually solutions"

Let's start from scratch!

Yeah, that would feel great! Just throw away the baggage and use the things you learned to build a better product. So, what does a timeline for this look like?

The project has years of features behind it. And although you could build it faster this time, it would still take you a year at least. That's a whole year of:

No new features
Less new users
Lost users
Lost opportunities

And there's no guarantee that the end result is going to be all that great or even on schedule! This is the development equivalent of nuking the planet because you're fed up with wars. The thing is, that's not a sure-fire way to get long-term world peace. Odds are that the newly formed nations will just have history repeat itself.

Eventually the same problems that got you into this mess are going to start creeping up again - unless you deal with them first. And that's assuming that the business survives a year of code-freeze, even if you could produce that pristine piece of perfection you're imagining.

Let's migrate to this new tech

No. Just no. Even entertaining this idea as a feasible suggestion is a sign of either a very inexperienced developer, or a developer that loves chasing the new hotness. Neither of those will help you claw your way out of Legacy - they're liable to just plop you into a soon-to-be Legacy code-base.

You should stick with battle-tested frameworks and processes for a few reasons. For starters, your team probably has more experience with it, and even if they don't the internet has years of solutions to problems with the given tech. Imagine going to Google and not finding a solution to your problem?

Even worse, imagine going to Google and the only result is a link to the same question you have. But it has no answer yet and it was posted only 8 hours ago.

Best case scenario, the new tech is supported long enough for you to finish your migration project. Worst case, it's abandoned for the next trendy piece of tech while you're mid-way through the rewrite. Good luck rewriting the rewrite.

Granted, if the tech you're using is already deprecated, you should at least entertain the idea of a proper upgrade.

In that case, let's work on cleaning up the current code without working on new features.

This idea is not going to make it past management, and for good reason. Again you're putting the business at risk, and with it your chances of making it to the end release. But this is the closest one to an actual solution, and with a few more pointers it can be part of your battle-plan.

I think we've painted a bleak enough picture. Let's talk solutions!

How does one fix "debt"? You start paying it off.

First of all, the Legacy code-base is not the cause of your problems. It's just the most glaring of all the side-effects. And if googling medical conditions online has taught me anything, it's that you should treat the cause and not the symptoms!

Start fixing what got you into this mess in the first place.

Standardize processes around development. It's more than just style guides and folder structures for your code. It means actually writing documentation. It means automating deployments. Automated tests have to be the cornerstone of your project, and not just an afterthought.

Obviously you're picking up where you left off with your code. You can't throw it all out (we went over this) but you can follow some guidelines that will clean things up in the long run.

Anything new that you write has to be good, unit-tested, documented code. You don't have time to go over the entire code-base, but as you touch parts of the bad code, you make a point of cleaning it up. Again, this means documenting it, adding test coverage and making sure it goes through a code review. There are several approaches to this, but those are topics of their own.

Allow for new features, because the business needs to keep running. But don't allow anything half-assed, under-researched or rushed out. You're already spending hours every week dealing with the fallout of the current code. You can't afford to add more tech-debt to your project.

So now you have an idea of what your schedule looks like. Part of it is working on new features. Part of it is doing cleanup of old code and part of it is "firefighting" due to the aforementioned code. A third of your development time each week should be spent on cleanup and optimization. This is going to hurt in the short run, but pay off in the long run.

Maintenance is you spending time on a launchpad today, so that you can propel your project forward tomorrow.

The goal is to pay down as much of the tech-debt now so that you can increase productivity down the road. This kind of maintenance is a constant thing. Make sure it's part of your development schedule moving forward.

You could throw more money at the problem by increasing the team size, but that's not going to be efficient. It depends on the business side: are you now producing features fast enough? If the answer is no(as if it's ever yes, when asking management), then you need to make sure you have a decent developer on-boarding process.

If budget does not allow for growing the team then the scope of the project should be dialed down. You need to make sure you have a good balance of productivity and code quality. You can't always have perfect code, but make sure that it's at least maintainable.

And, in case it's not obvious, the developers need to convince the rest of the company that whatever you decided on is the way to go. This means convincing them that you need to make this investment, and that it will pay off. The tech-debt metaphor is familiar and easily digestible - use it in your arguments.

The Pay-off

Obviously, the above is a lot to commit to. But the results are tangible. The project can be salvaged. Legacy becomes maintainable and keeps making money. Once there's more money you can get crazier with development. You can break the monolith down to micro-services. You can migrate everything to AWS. You can change your mind and string the micro-service back into a single code-base.

All that and more you can do after you've solved the underlying problems of your project, and kept the business running in the process.

You now have predictable releases. You have a stable product. You have well-tested, documented code that you can work on with more confidence. You do less firefighting, you anger less users, the business makes more money and the development team can keep growing efficiently. Happier developers, happier users, happier management = successful product.

The exact architecture and approach of your "survival" can take many forms. Share your horror stories and successes in the comments and let's show the developer world that there is a light at the end of the Legacy tunnel.

attribution: cover image is from the game Dwarf Fortress, messy cables, neat cables, brainstorming, _Comic from MonkeyUser.com

Oldest comments (2)

Sasha Blagojevic • Oct 28 '19

You perfectly described my current situation lolz. Thankfully we’re taking these steps to mitigate our current situation in the long run. What strikes me the most and with which I wholeheartedly agree is the core issue of shitty codebases are always the development processes, everything else is just a symptom.

Nick Cinger • Oct 28 '19

I feel like this happens a lot! In talking with other developers (in job interviews) I see a lot of them leaving their current project because the code is a mess and there isn't a process in place to fix it.

With a proper process you at least have that light at the end of the tunnel!