Weekly Dev Tips
Maintain Legacy Code with New Code
Maintain Legacy Code with New Code
Many developers work in legacy codebases, which are notoriously difficult to test and maintain in many cases. One way you can address these issues is by trying to maximize the use of new, better designed constructs in the code you add to the system.
Sponsor - DevIQ
Thanks to DevIQ for sponsoring this episode! Check out their list of available courses and how-to videos.
Show Notes / Transcript
Legacy code can be difficult to work with. Michael Feathers defines legacy code in his book, Working Effectively with Legacy Code, as "code without tests", and frequently it's true that legacy codebases are difficult to test. They're often tightly coupled, overly complex, and weren't written with modern understanding of good design principles in mind. Whether you're working with a legacy codebase you've inherited, or one you wrote yourself over some period of time, you probably have experienced the pain that can be involved with trying to change a large, complex system that suffers from a fair bit of technical debt and lacks the safety net of tests.
There are several common approaches to working with such codebases. One simple approach, that can be appropriate in many scenarios, is to do as little as possible to the code. The business is running on it, none of the original authors are still with the company, nobody understands it, so just keep your distance and hope it doesn't break on your watch. Maybe in the meantime someone is working on a replacement, but you have no idea if or when that might ever ship, and anyway you have other things you need to work on that are less likely to keep you at work late or bring you in on the weekends. I don't have any solid numbers on how much software falls into this category, but I suspect it's a lot.
The second approach is also common, and usually takes place when the first one isn't an option because business requirements won't wait for a rewrite of the current system. In this case, developers must spend time working with the legacy system in order to add or change functionality. Because it's big, complex, and probably untestable, changes and deployments are stressful and error-prone, and a lot of manual testing effort is required. Regression bugs are common, as tight coupling within the system means changes in one area affect others areas in often inexplicable and unpredictable ways. This is where I think the largest amount of maintenance software development takes place, since let's face it most software running today was written without tests but still needs to be updated to meet changing business needs.
A third approach some forward-thinking companies take, understanding the risks and costs involved in full application rewrites, is to invest in refactoring the legacy system to improve its quality. This can take the place of dedicated effort focused on refactoring, as opposed to adding features or fixing bugs. Or it can be a commitment to follow the Boy Scout Rule such that every new change to the system also improves the system's quality by improving its design (and, ideally, adding tests). Some initial steps teams often take when adopting this approach are to ensure source control is being used effectively and to set up a continuous integration server if none is in place. An initial assessment using static analysis tools can establish the baseline quality metrics for the application, and the build server can track these heuristics to help the team measure progress over time. This approach works well for systems that are mission-critical and aren't yet so far gone into technical debt that it's better to just declare "technical bankruptcy" and rewrite them. I've had success working with several companies using this approach - let me know if you have questions about how to do it with your application.
Now let's stop for a moment and think about why working with legacy code is so expensive and stressful. Yes, there's the lack of tests which limits our confidence that changes to the code don't break things unintentionally, but that's based on a root assumption. The assumption is that we're changing existing code and therefore, other code that depends on it might break unexpectedly. What if we break down that assumption, and instead we minimize the amount of existing code we touch in favor of writing new code. Yes, there's still some risk that our changes to allow incorporating our new code might cause problems, but outside of that, we're able to operate in the liberating zone of green field development, at least on a small scale.
When I say write new code, I don't mean go into a method, add a new if statement or else clause, and start writing new statements in that method. That's the traditional approach that tends to increase complexity and technical debt. What I'm proposing instead is that you write new classes. You put new functionality into types and methods that didn't exist before. Since you're writing brand new classes, you know that no other code in the system currently has any dependencies on the code you're writing. You're also free to unit test your new classes and methods, since you're able to write them in a way that ensures they're loosely coupled and follow SOLID principles.
So, what does this look like in practice? Frequently, the first step will be some kind of refactoring in order to accommodate the use of a new class. Let's you've identified a big, complex method that currently does the work that you need to change, and in a certain case you need it to do something different. Your de facto approach would be to dive into the nested conditional statements, find the right place to add an else
clause, and add the new behavior there. The alternative approach would be to put the new behavior into a new method, ideally in a new type so that it's completely separate from any existing structures. A very basic first step could be to do exactly what you were going to do, but instead of putting the actual code into the else clause, instantiate your new type and call your new method there instead, passing any parameters it might require. This works well if what you're adding is fairly complex, since now you have a much easier way to test that complex code rather than going through an already big and complex method to get to it.
Depending on the conditions that dictate when your new behavior should run, you might be able to get out of using the existing big complex method at all. Let's say the existing method is called BigMethod
. Move BigMethod
into a new class called Original
and wherever you had code calling BigMethod
change it to call new Original().BigMethod()
. This is one of those cases where you're forced to change the existing code in order to prepare it for your new code, so you'll want to be very careful and do a lot of testing. If there are a lot of global or static dependencies running through BigMethod
, this approach might not work well, so keep that in mind. However, assuming you're able to pull BigMethod
into its own class that you then call as needed, the next step is to create another new class for your new implementation. We'll call the new class BetterDesign
and we'll keep the method named BigMethod
for now so that if we want we can use polymorphism via inheritance or an interface. Copy BigMethod
from the Original
class to your BetterDesign
class and modify it so it only does what your new requirements need. It should be much smaller and simpler than what's in Original
. Now, find all the places where you're instantiating Original
and put in a conditional statement there so you'll instantiate BetterDesign
instead, in the appropriate circumstances. At this point you should be able to add the behavior you need, in a new and testable class, without breaking anything that previously depended on BigMethod
. If you have more than a few places where you need to decide whether to create Original
or BetterDesign
, look at using the Factory design pattern.
By adjusting the way we maintain legacy systems to maximize how much new behavior we add through new classes and methods, we can minimize the likelihood of introducing regressions. This improves the code quality over time, increases team productivity, and makes the code more enjoyable to work with. If you have experience working with legacy code, please share it in this show's comments at www.weeklydevtips.com/015.