Ben Halpern

for CodeNewbie

Posted on Mar 30, 2023

How Do You Handle Legacy Code When Starting a New Project?

#discuss #codenewbie #beginners

When starting a new project, dealing with legacy code can be a daunting task. Patience is a virtue; if you can, take your time to understand the code and document it well. Don't rush the process, as this can lead to mistakes and introduce new problems into the codebase.

But what if you're on a tight schedule. How do you handle it? What strategies have you found to be most effective?

Share your experiences and insights on legacy code management with the Newbie community. Let's learn together!

Top comments (30)

Joe Mainwaring • Mar 30 '23 • Edited

What I'd like to do with legacy code

What usually happens:

Tracy Gilmore • Mar 31 '23

I would love to take a flame-thrower to the legacy code on my project. However, that would only leave ashes of a project and it has only been running for 2-3 years with a team of 6 developers! I think I might be the next thing fired.

Taylor R Price • Mar 31 '23

100% accurate. There is some legacy code that isn't garbage that I'd like to burn down but that's doesn't happen too often.

Keith • Apr 1 '23 • Edited

Start by convincing yourself not to add any features until the architecture is ready. Aim to refactor only, without changing functionality.

Find or write the test cases that establish whether the deployment target has the necessary functionality and connectivity, (platform tests) [sidenote: use a different test framework to the one baked into the application so as to avoid that dependency e.g. github.com/keithy/okay-php ]. These will allow you to try deploying to different places, upgraded languages, libraries and servers, or local development environments. Attempting to set up a fresh local/upgraded/alternative development environment will produce failures that you can turn into more detailed platform tests. These test-cases give you some agility, the ability to move with confidence, and will pay you back handsomely when it comes to re-deployment or when "heading for the clouds".

Find or write test cases that establish whether the overall functionality is as expected. These need to be at a level that envelops the whole project, and covers most significant features, the goal here is have a sanity check, to be able to refactor without a successful refactoring being the cause of a test failing. A failing test needs to communicate whether or not a refactoring step has been successful in changing structure without breaking functionality. Therefore these tests need to be at a level that does not break due to changing internal structure. It is safe to assume that this 'whole-application' testing will not be the fastest, so if possible spin off a separate project to parallelise running these tests spinning up test instances in the cloud.

The first refactoring efforts need to be to enhance testability. It is usual to begin by looking at how the application is configured, and to begin by looking at the patterns used to introduce configurable values into the code. It goes without saying that any hard-coded config needs to be moved into configuration files. Look at adding feature-flag type functionality into the configuration. The testability improves as it becomes possible to run the test scenarios with different configurations.

If databases are involved, setting up test data, and or faked data, and being able to run against an in-memory database, will allow for faster testing without resorting to mocks. Create the ability to snapshot and trim a database so that it is able to provide test fixtures. Use/introduce a database migrations framework so that the schema is managed under source control. Have the test configuration be able to specify the data source, so that a test run can target an appropriate test fixture.

Now that the application is testable as a whole, it is possible to start looking at its internals. Start by looking at class instantiation, and introduce the factory pattern, so that class names are no longer hard coded, but go via a factory. Factories are chosen or configured from the configuration file, either directly or via feature flags. This allows classes to be replaced with improved implementation, without removing the existing class, which can remain for reference.

Then we look at some services that are used across the whole application, since we are interested in testing and instrumentation prior to refactoring, the most likely the first candidate will be logging. Introducing a modern logging framework can be your first real chunk of new functionality. Using the factory pattern existing logging classes can be switched out for the new ones, and a few strategic classes can copied sideways and have instrumentation added; keep the existing implementation in case the logging itself adversely effects the functionality. There is usually a fair bit of tidying that can be done to organise decent logging levels. Try and add logging that has zero-impact when it is not enabled. You can also freely add logged-assertions anywhere in the codebase. There may be language/runtime support (e.g. PHP) for assertions, that allow them to have zero-impact when disabled. Also add verbose logging levels to all external interfaces, especially the database.

Once the application is instrumented, a separate project can look at log analysis to provide some overall performance metrics. A detailed log-level can be added to the factories to tell you which classes are being instantiated, and then even the most obtuse app will reveal intimate relationships between classes.

At this point you are equipped to start thinking about modularity, and re-architecting the application. We are looking to improve the structure of the application, introducing or enforcing a layered architecture and improving modularity. The specific goal here is to clarify and tidy up the dependencies so that they face in a singular and correct direction. In a layered architecture, your building depends upon the foundation, and your roof upon the building. If the roof depends upon the foundation, or your foundation is tied to the roof, you are unable to treat any component as pluggable.

For web-applications and REST APIs it should be noted that the web is an I/O device, the application business logic should not have any code that is specific to any UI framework. The web or UI, the database(s), the logger, can all be plugins to the core functionality.

If you identify a module of functionality that can be extracted into a separate package, then that package can be published independently, and effectively listed as an external dependency. Easy targets include data importers and exporters. The platform tests we began with can now include a requirement upon this "external" package as a pre-requisite. We should already be thinking in terms of managing our project and its releases though a modern packaging tool that understands dependencies.

Having all the pieces in place for configurability, instrumentation and testing, and having pared down to the core functionality by extracting any optional modular components, we have set the stage for bigger changes.

Before doing so however, at this point it ought to be possible to release the product once more, even though no actual new features will have been added. However the new release, and operations environments can already take advantage of the better logging and instrumentation. Not to mention that existing bugs will be illuminated with more information.

While your 2.0 release is out in the wild, you should anticipate supporting this for a few point releases, and incrementally improving that parts you have worked upon thus far, while you branch off and work on version 3.0.

Now you should have earned yourself some breathing space. You have the technologies in place and some time in which to branch off in a more radical direction. You can now refactor-radically in the case that a major architectural shake up is needed, and refactor-mercilessly in the case that the system is not tidy, or sufficiently readable. You can even experiment with several different projects trying different architectural ideas.

Now we are ready to start introducing new features, we look at how to add these as additional plugins. If possible we will refactor the application to have a pluggable architecture.

I would find it difficult to resist attempting to break free of class-oriented programming and to adopt an actual object-oriented architecture, namely Data-Context-Interaction (DCI). [sidenote: If it is Java you likely will be dissapointed] Whether your language of choice can support DCI efficiently, is a question that your new performance metrics will be able to shed light upon.

Good luck.

Jean-Michel 🕵🏻‍♂️ Fayard • Mar 30 '23

In that case you might as well accept that you are going to ship more bugs, so instead focus on improving the time it takes from delivering bugs to production to detect it to revert it.

better monitoring
alerting that pings everyone immediatly in Slack (not emails)
one-click revert

Adam • Mar 31 '23

you might as well accept that you are going to ship more bugs

100%. Brilliant comment.

The worst part is dealing with managers saying, "Why are you shipping bugs? You're not doing enough testing, you're not careful".

Has nothing to do with the 11% test coverage spaghetti mess I've been handed with new feature requests that must be delivered reasonably timely

Joost Helberg • Mar 31 '23

The common approach is to not invest in trying to understand the legacy code and to dismiss it's qualities. Management normally is happy to accept that and then allows ignoring it until replaced. Most of the time, that is an invalid, costly, sometimes fatal approach. Only after understanding the legacy code, one can tell whether it is any good or not. Write tests to cover it and to maintain it's proven functionality while cutting away parts that need to be replaced. The argument 'it is so bad, I can never understand it' is silly, it says more about the new programmer than about the old one.

Adam • Mar 31 '23 • Edited

Another brilliantly enlightened comment.

Distinguishes intermediate engineers (people who are totally capable of designing systems, but aren't capable of delivering things inside of huge messes, so they instead avoid it and go off over in a corner and say "I've got a replacement for that messy thing, I just can't replace the ugly thing so now we have to support my new smart beautiful module, plus the old code that I couldn't understand") from senior engineers who know that the grand rewrite will never happen and, if it does, it'll happen by mostly fully understanding where things are coupled and by shipping tiny changes inside the existing code until it's been strangled and can be removed

viyashdoss • Apr 4 '23

yes

Ben Calder • Mar 30 '23 • Edited

Define "legacy code". It isn't by definition bad; but of course it can be very bad indeed.

My first step would be to try and understand what it's doing so that, if changes are going to be required, I'm ready to make those changes. This might involve adding documentation where necessary.

If it's clearly bad code, or dependent on outdated technology (e.g. defunct libraries; difficult to manage build processes etc.); I would figure out a path to refactor it to remove that code and modernise where appropriate.

But sometimes "legacy code" is just fine. I know a site I worked on that is (at least the last time I checked) still running the script I wrote over ten years ago. I deliberately chose to use vanilla js with no dependencies (at the time most likely to be JQuery). I'm sure it could be modernized/improved; but apparently it still does the job well enough 😁

Davide de Paolis • Mar 31 '23

working on legacy code is rarely nice. but can be a fundamental experience for devs at stage of the career at different levels ( I wrote about it some time ago in this post.

under a tight schedule can be very difficult, but the best strategy is really applying the boyscout rule, and add unit tests whenever you touch some lines of code.

fighting complexity and enthropy is a project is a constant effort, and if we lower our attention or give up because time is tight and code is already crap, we can only make the project and our live worse, at an even higher speed.

Ryan Kahn • Mar 30 '23 • Edited

I recently saw a talk given by Jason Blanchard at LeadDev NYC called "Everything is a Migration", and I think it helps out the concept of legacy code in perspective. The idea is that of viewing software and product development as an evolutionary process. In that lens, legacy code is just code, not a blight to be eliminated as some tend to think of it as.

Maybe it reflects an earlier set of priorities, if so evolve it like you would any code. Maybe the refactor is more intense, or more disruptive, but the process of migrating old code to new priorities is the same in spirit as all the work we do. If on the other hand it works and still fits the need, maybe leave it alone like you would any code that is good enough for the task at hand.

So I would say the way I'd handle it in any case shouldn't be fundamentally different than any other code you might find in a codebase

Ingo Steinke, web developer • Mar 30 '23

I am not sure what's your point? Starting a new project from scratch, how to prevent generating legacy code? That would be following best practices, striving for clean code and minimalism, conceiving or using a good software architecture, writing tests right from the start (TDD if possible), naming variables and functions, adding useful comments (like JSDoc), preferring typed languages etc. and last but not least, as you said, documentation!

But maybe you mean how to deal with legacy code when joining an existing project? That's more common and worse in a way, but on the other hand, there is always someone else to blame at least before you have touched most lines of code sooner or later. Again, testing and documentation might help: is there documentation at all? Is it outdated? Does it still contain helpful information? Maybe it helps to understand the original intentions and requirements. Are there tests yet? If there aren't, we can start adding some simple tests while or before proceeding with our work.

Some other considerations: for the sake of consistency, we should probably align our new work with the existing code style and tools, even though they might be outdated from the current point of view. When everything has been written in ES3 JavaScript with spaced brackets and aligned equals signs, that's obviously the project's code style at the moment (and that's still the official WordPress recommendation right now, in 2023, as I found out recently).

And like Jean-Michel said, accepting that we are going to ship more bugs can help us to achieve anything at all.

Frank Font • Mar 30 '23

Sometimes I start by creating unit tests of existing functionality and then creating unit tests for the feature I'm going to add.

webbureaucrat • Mar 30 '23 • Edited

It's a process.

Nail down a deployment process. It doesn't have to be the sleekest CD pipeline you've ever seen, but it has to be few enough manual steps that I can do it consistently without breaking things.
Deploy to QA and do user acceptance testing of the current codebase, by which I mean I want someone who is in a position to know who can verify that nobody has been sneaking changes into prod without putting them into version control. (I have been burned by this every single time.)
Start making tiny changes.

a. The first round, put in some comments. Deploy.

b. The second round, gently fix some indentation or formatting. Deploy.

c. The third round, rename some things to make them easier to read. Deploy.

d. The fourth round, start breaking up giant methods into smaller methods. Deploy.

e. Continue a-d until you have some methods small enough and pure enough to unit test. Deploy.

Now you have living code, and, hopefully, with each deployment, it goes a little smoother and quicker and becomes a little more automatic and a little less scary. Eventually you can build a pipeline with what you've learned.

View full discussion (30 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.