DEV Community

Josh Holbrook
Josh Holbrook

Posted on

The Legacy Software Dialectic

Earlier this fall I started at Eaze, a cannabis delivery startup, where we have a Legacy Monolith. This monolith is written in C# and .NET, in a style that was dated even when it was written a decade ago. It's "highly re-entrant", in the words of our (most) distinguished engineer - not a good thing. Developers have been trying to slay this monolith ever since it was built, and have made their own mistakes along the way. Today, even as the stack has shifted away from .NET and towards Node.js, TypeScript and Rust, a version of that service is still here with us - as are many of those past mistakes.

I've worked on many a legacy codebase over my career, in many contexts: in data engineering, backends, financial reporting, devops/CI tooling, and more. Over the last few years, I've started to notice a software pattern, and the situation at Eaze brings it to the forefront of my mind. I call this pattern the LEGACY SOFTWARE DIALECTIC.

For our purposes, I don't equate a legacy project with one that isn't being actively worked on, nor one that's strictly being sunsetted. Instead, I see legacy software through the lens of Peter Naur's "Programming as Theory Building". In this paper, Naur couches the act of programming as developing an internalized theory for how The Software works, which enables a team to make changes efficiently. He goes further to describe legacy software as software for which the current team doesn't know how it works. In Working Effectively with Legacy Code, Michael Feathers alternately describes legacy projects as simply projects which are tough to change. This insight is aligned with Naur, though it's Feathers that links this idea to missing (or ineffective) testing. While I don't think tests are sufficient - you have to understand the tests as well as the software - it's certainly true that a "legacy" project by similar definition can be refactored incrementally into a "current" project. It's these projects which I think fit this pattern.

Dialectics are a concept that is initially credited to an early 19th century German idealist philosopher named Georg Hegel, and taken in a slightly different direction by Karl Marx and other thinkers in the materialist and leftist traditions. All dialectics follow the form of a particular type of narrative, much like how a story might follow the Hero's Journey. The acts in this story take a three part form. Hegel called these stages the abstract, negative and concrete; though many people today talk of thesis, antithesis and synthesis. We can see these in the story of a legacy project.

Thesis: The project you're working on is a legacy project. Doing anything with it is excruciatingly slow-going and it's way too easy to accidentally break something. The abstractions are clunky and indirect. Components are tuned in ways that don't make any sense. Every day you work on this project is a day where you curse your forebearers's names - if you even know what they are. You know what Good Software is, and this isn't it.

Antihesis: The project you're working on was written by rational, well-meaning people, who made the decisions they made for valid reasons. The vast majority of engineers take pride in their work, want to create quality code, and most importantly want to work on an impactful project that's fun to add features to. We all endeavour to build the best systems we can. If nothing else, this project has stood the test of time, ultimately doing what's been required of it, often for years. The legacy project seems to be doing something right, and your attempts to "fix" it often make it worse.

In a dialectic, these two components exist in contradiction with each other. The thesis narratively stands for a sort of hypothesis, a model of the world. The other name from Hegel's writing - abstract - hints at how this component forms an initially naive model. The antithesis - or negative - meanwhile stands as an angry refutation of that model. Our initial thesis - that this is how you make good software - seems to be provably Bad.

These dual components of this present as a semi-stable equilibrium. As you try to refactor, rewrite or otherwise address the legacy system, contradictions between your current team's burgeoning theory and the decisions of the erstwhile team make themselves apparent. In other words, you run into snags. Perhaps touching that bizarrely configured tunable somehow breaks the batch jobs. "If I were the engineer that wrote this I would have simply written good code instead of bad code," you find yourself saying, even as you lose your boots to the quicksand. Yet, it continues to bring you pain.

Dialectical materialists conceive of history as a never-ending series of these dialectics playing out in real life - much like the refactors playing out every day in our offices. In this conception of history, societal progress occurs by truly understanding the contradictions and resolving these duels into a single unified, cohesive whole which cancels the negative while retaining the strengths of the initial thesis. The situation transcends its initial state into a new, better world. I'd go on to say that these effects are roughly proportional - that is, the more stark the contrast and the more intense the conflict, the more satisfying (and perhaps beautiful) the synthesis will be.

In a past role, I led the rewrite of ingestions and pipelines supporting a commerce affiliate program for a medium sized media company. What existed when my team onboarded was in a sad state. Developers who had worked on the core service consistently said mean things about it and went out of their way to not work on it - and it had suffered as a result. Many of the components were strangely broken, some inaccurate. We knew immediately when we saw them that we wanted to push them into the sea as quickly as possible.

It was initially a project we thought we could sunset mercilessly within a few sprints, but we quickly discovered that the issue was more complex than we initially thought. As I pored over the source code to the core service I began to become very worried about the scope of work and rightly so - what we had originally planned to take a few weeks turned into a long-term major set of rewritten and extended pipelines that we developed and maintained over the course of multiple quarters.

We of course fixed and overhauled almost everything. Yet as we pulled metrics into our core BI tool, added systems health and monitoring, generated more accurate numbers and even built A/B testing capabilities into our commerce metrics, a few critical aspects of the old designed remained: The format we used when generating tracking codes, the language and database server for hosting the web-site metrics and the obscure API we were using to pull reporting were all retained in the final design. As time progressed, I gained not just an understanding of the negative aspects of this project's antithesis, but also a greater appreciation for some of the oldest abstractions in the codebase.

As for Eaze? The work to resolve our dialectic is happening, sometimes slowly but always steadily. The monolith is currently being replatformed onto modern .NET running on Linux. We're in the process of ripping out the most offensive abuses of Entity Framework. Most of the early era pre-await Promise code written in an attempt to paper over the monolith's glaring issues has been replaced with TypeScript and/or Rust, and we're pulling functionality out of that monolith and into a streaming architecture using Rust. Resolving this dialectic has proven to be challenging work for us, but also rewarding - both intrinsically and materially.

P.S. By the way, if you're a rustacean and this sounds interesting to you, my team is currently hiring senior Rust developers! Come work with me!

Top comments (0)