DEV Community

Mathieu Ferment
Mathieu Ferment

Posted on

Large migration projects are complex battles

After working as a developer for 9 years, I switched tracks to a manager position in 2021 (which is not very unusual in this industry). It’s been a bumpy road, but after 3 years I start understanding things and concepts that, retrospectively, were affecting me as a developer before. It looks like I'm starting to grasp them thanks to the fact that being a technical manager forces you to understand both software development and software business.

Today I would like to share some of these things with you.

Business and software development

Let’s start with one: “most enterprise software serves business”.

The very first reason why professional developers write code is to implement a business usecase. Maybe it’s creating a website to provide information about a company. Maybe it’s building a ticket selling platform for an event. Maybe it’s simply a data repository tool, like the data of a company's employees, and the company needs a basic CRUD interface for this database to help its HR department.

But the scenario behind all of these software development projects is identical:

  • A company has a business need
  • The business need can be addressed by building the right piece of software
  • One or many developers (and other people like designers, project managers…) are hired to build the piece of software.

And this little story above drives many developers' teams lifes. Business need, piece of software, developer team. It’s a cycle. As long as it stays healthy. By this I mean that it works if the value delivered by the piece of software is higher than the amount of money spent on step 3. If the company paid 200k$ to build a software that saves 300k$ yearly, it’s a good return on investment.

Things however can go wrong when software must undergo a migration.

A migration, in this blog post, is a software project where a piece of software is modified but the end user does not notice the change as the behavior is kept identical. The consequence being that the end user does not perceive any benefits from this migration. It breaks the healthy scenario mentioned above at least if you just look at the direct and visible costs.

As an example, let's say company A has 4 web python applications powered by framework B, and decides to migrate the 4 applications to use another framework C. End users will hopefully not notice any change, as the goal of a successful migration is to keep everything identical for end users. Only the developer / administrator team of the 4 applications will be aware of the migration outcomes.

There are many reasons why a company might need to perform a migration of one or many of its software applications. It can be switching providers, it can be switching technology, it can be upgrading an old piece of code, it can be partnerships changes, it can even be politics. But it happens all the time.

Today I would like to discuss the business challenge of large migration projects and how leading them to completion is a complex battle.

Three strategies

In order to achieve a migration project, I can think of 3 software migration strategies to lead it, with their benefits and drawbacks.

The big bang strategy

The big bang strategy is the simplest strategy. Let’s go back to the python scenario: company A has an application called FullSwitch, which uses the framework B. The goal is to migrate the application from framework B to framework C instead. We’ll call the current codebase version 1.0, and the codebase, once migrated to framework C will be version 2.0.

Applying the big bang strategy principle, the developer team starts the tedious work of building version 2.0, migrating the codebase from framework B to C. During this time, they completely stop working on version 1.0 (or almost completely). After many days/weeks/months of work (depending on the codebase size and complexity) the codebase version 2.0, running on framework C is written! Now comes the time to perform the switch in production: it’s deploy time.

The switch is performed on Monday (and not on Friday): the application running on version 1.0 is shut down, the data is migrated, and the version 2.0 is deployed and takes over. The replacement is done and everybody’s happy.

Happy? Not quite all the time. Because what I mentioned above is the happy scenario. In less optimistic scenarii, things can go wrong as the big bang strategy has 2 specific flaws.

The first flaw is that the v2.0 deployment happens at the very end. Which means that if the development phase duration was 6 months (not an usual timeline for a large codebase), during this phase the version 2.0 was solely running on development environments. The software industry has learnt the hard way that waiting for the very end of a software project to deploy is a very bad idea.

The longer the time between when code has been merged and when it is released increases the chance of an oversight. The only real feedback is received in production. Waiting and accumulating all of these code changes to deploy them all at once is a guaranteed way to have multiple major bugs in production to fix at the same time, and probably a painful deployment process. History has taught us that bigger changes are harder to review, debug and fix.

The second flaw is that during the development phase, the version 1.0 was completely frozen as the whole developer team is focused on building version 2.0. The longer the version 2.0 is built, the stronger the pressure will be from business to resume operations on version 1.0 because they simply need it - and I mean it. The ability of a company to rely and adapt on its software is key to staying a successful competitor in its market. A too long software freeze period can be the downfall of the business, and even if the version 2.0 is finished, if the business is sinking, the whole company crumbling, the version 2.0 will soon rot in the cemetery of unused software. A company needs its software to evolve and adapt to its business needs. It can wait for a few weeks, months hopefully, but it cannot wait forever without risking going bankrupt.

That being said, it is unlikely that the company would wait for its downfall. If the version 2.0 development phase is too long, what is most likely to happen is that the company will decide to halt the version 2.0 project. Because the lack of investment on version 1.0 had too much impact, because the business was at risk. And unfortunately it was probably the right call. The sad consequence is that the version 2.0 and all of the time invested by the developer team in it will be, in lucky cases, postponed, in less lucky cases, canceled. This is the full failure scenario for a migration, where the version 2.0 is never launched.

The cars race strategy

In order to address the big bang strategy flaw about business pressure, an obvious idea comes to the mind: continue maintaining version 1.0 . This means that instead of focusing the full developer team on building version 2.0, the team is split between two groups, one continuing working on the v1.0 codebase while the other builds the v2.0. With a reduced workforce dedicated to it, version 2.0 will be longer to build but we avoid the grim business downfall scenario of the big bang strategy.

Two other problems do arise.

The first is that all of the work done on v1.0 directly serves the business needs. It helps employees be more productive, it fixes pain points for company employees, it allows the company to reach new customers and markets and the new features added help convince new customers to adopt the application. It serves the company well. On the other hand, the work on v2.0 stays hidden in the shadows and, on paper, does not produce any added visible value.

This situation can be fine for a few months, but the longer it stays like this, the more likely the day will come where a director of the company says out loud “Why are we spending so much on v2.0 when the return on investment is zero? We could do so much more if the whole developer team was working on v1.0”.

You might think that this director is stupid and completely misses the point, that the migration to 2.0 is needed, that he’s focusing on short term benefits instead of long term… but doing so you would be overlooking how humans minds work.

If the 2.0 project was started yesterday, the idea is “fresh” in everyone’s mind. It’s clear why it has to be built. And most people in the company will understand - and agree - that it is necessary to perform the migration from v1.0 to v2.0 . But as time passes, that message will fade. As time passes, benefits of working on 1.0 will be tangible while the promises of the 2.0 benefits will become more distant, more abstract. This is a natural human mind's behavior, similar to how the media cover topics. When a new war begins somewhere in the world, the topic is broadly covered, and people are very interested in being informed and how it’s going. But if the war drags on, the people’s attention will shift to something else. The loss of interest of the public in that war will be seen in how the media cover the topic: less and less screen time will be dedicated to it until it becomes a thin line at the bottom of the newspaper.

So just like with the big bang strategy, the longer the version 2.0 is built, the higher the probability that the project might be halted ; this time because the perceived benefits have reduced while the tangible cost - in missed opportunities - is very real.

A skilled technical director might be able to fight this trend, by communicating often on version 2.0 progress, demonstrating the new version outcomes and benefits, but he can only delay the problem. He’s still in a race against time.

The second flaw makes the first even more difficult to deal with, and it’s because of this flaw I named this strategy the “cars race” strategy.

In the cars race strategy, two developer teams work on two versions of the same application. The first team continues working on v1.0, and constantly adapts it to new business needs: they ship bug fixes, new features, improvements. The second team works on v2.0 and wants to achieve the very same behavior and features of v1.0 but with a different framework. We have 2 cars racing against each other.

However the first developer team is leading: because they keep adding new features to the application perimeter. So they constantly push forward the limit that the second developer team must reach, because the v2.0 can only replace the v1.0 if they address the same business needs. So the v1.0 car is ahead, and keeps accelerating, and the v2.0 car must follow and catch up to win the race. There is unfortunately a scenario where the v2.0 car never catches up and loses the race, meaning it is never launched. It ran out of gas, because gas is expensive and the company did not want to put any more gas at this point of the race.

Finally the cars race strategy keeps the technical pitfall of the big bang strategy: the deployment of the new codebase happens at the end of the development phase, with all the drawbacks mentioned previously.

The trickle strategy

The trickle strategy aims to fix the above issues by avoiding the idea of competing v1.0 and v2.0 versions.

Instead, the migration is performed by iteration. A small part of v1.0 is migrated from framework B to framework C, and once this part is done and functional, a deployment of this codebase, let’s call it a version 1.1, is done. Version 1.1 is an hybrid: part of it runs on framework B and part of it runs on framework C. Then another iteration can start, leading to a release of version 1.2, with a higher part of the software migrated to framework C. The work continues, iterations continue, regular deployments continue until 100% of the codebase has been migrated to framework C, at this point we can call it a v2.0.

This strategy provides 2 obvious benefits:
First it avoids the scary one-shot deployment time of the 2 previous strategies and instead allows regular deployments of the solution. Deploying small changes is the first thing to start doing to make deployments less scary and diminish the risk.
Second, there is only 1 application running, not two: business cannot “shut down” the v2.0 because it does not exist. And the 1.0 does not exist anymore. What the business is using is the 1.3 or the 1.4, the only version that exists, that both serves the business needs and grows closer to the target migrated codebase.

Is that the perfect solution then? Unfortunately no. The trickle strategy comes with multiple costs that the previous strategies did not have.

The first is a cost paid by the codebase. Having 2 frameworks in one codebase is a technical design issue. It increases complexity, it requires some bridge code to be written that allows the 2 parts of the application to communicate and interact, it makes the application more complex and expensive to run and tangible side-effects like performance issues can emerge.

Having to continuously adapt and maintain that bridge between the two frameworks is an expensive operation, and the longer the project development duration, the higher the cost, quite simply. After a few months, there will be one day where a developer will state out loud the obvious “It’s so complex! We should have done this using the big bang strategy, we are wasting so much time doing this iteratively! This bridge code is a mess and we’ll throw it away one day, what a waste!”.

He’ll say that, because he’s paying actively the cost of that strategy: it’s the developer team that suffers from that hybrid situation, not the business. On the contrary, the 2 other strategies above did not push any added complexity to the developer team. We can also guess that because of the hybrid nature of the codebase, and of the regular deployments of each complete iteration, the development phase will be a lot longer than the previous strategies. But at least this time the team is not in a race against time.

Or is it? There’s another danger related to time. As the software continues to run in production through iterations v1.1, v1.2, v1.3 and so on, it’s likely that these interactions will contain migrated code, but also code improvements, bug fixes, and other code changes aimed to help the business. So these iterations will contain a mix: code changes aimed at achieving the migration and code changes aimed at helping the business.

Let’s say the development phase goes on for months. Maybe years. Just like with the cars race strategy, the benefits of the migration, which were clear in everybody’s mind in company A at the beginning of the migration project, do fade in time. As time passes, company employees outside of the migration project look at the developer team, shipping iterations after iterations, and think “I get it that every iteration allows the migration to advance, but well maybe we could do a little more of business code changes, and less migration code changes? I mean it’s OK if we finish the migration a little later, nobody’s counting on that, and on the other hand we have this really important project for the customer that we need to ship fast and…”

I guess you get it: the risk with the trickle strategy is not that business would halt the project completely one day, but slowly invests less in its migration part, more in other projects, and maybe one day they even halt it and think “well, it’s true our software is an hybrid between 2 frameworks but is it really an issue? I mean it runs, doesn’t it? We can live with that”.

Conclusion

“It depends”: for the majority of questions in this world, that is the only true answer. And that is the answer I give if someone asks me which of the 3 strategies is the best. Each comes with its benefits, each comes with its risks, and each comes with its costs.The one you should choose for your migration project depends on your context, your codebase, your developer team and the business behind it.

The only thing that these 3 strategies do share is: if it goes for too long, you risk a complete halt of the project. It is important to contain the migration to a set timeframe, and make sure deadlines don’t slip too much. You need to do it as fast as possible, you need to deliver it as fast as possible, before human minds do what humans minds do and change their minds as time passes.

Acknowledgement

This article that focuses on different migration types and the technical risk they carry has been a great help for writing this blog post.

Top comments (0)