Update : french people can find a talk I did in Paris Java User Group inspired by this article here
During the various missions I've been sent to, I've worked on a variety of legacy software projects which suffered from various kind of flaws.
Of course, poor software quality (no unit tests, clean code principles not being used...) was often a major issue, but there were also problems coming from architectural decisions taken in the early days of the project, or even at the dawn of the enterprise system. This kind of issues is, from my point of view, the greatest cause of pain for many projects.
As a matter of fact, improving code is quite easy, especially now that the software craftsmanship movement is spreading good practices across teams. But changing the core of the systems, the constraints that were imposed at the very beginning of its lifecycle, is very challenging.
I'll talk about several types of architectural decisions that I've encountered, and that can be real burdens for the teams maintaining these systems.
This is probably one of the most common issue I've seen. When several applications need to use common data, why can't we simply share the database? After all, repetition is a bad thing in software development, right? Well, not eveytime, and especially not when database is involved. Venkat Subramaniam said it in a way that can't be forgotten: "A database is like a toothbrush, you should never share it". What's so wrong about sharing a database? Many things in fact...
First thing that I can think of is obviously the coupling in the datamodel. Imagine that 2 applications A and B are dealing with cars. Application A is used in the team responsible of repairs, and so they need to store a lot of technical data about the mechanics, the failures, the history of interventions on the car... Application B is used to handle appointments for the technical team, so it only needs basic information about the car to be able to identify it. In this case, using the same datastructure for both applications makes no sense: they use very different data, so they should use their own data strucuture. This is made even easier since a car can be easily identified, so there is no need to share a common referential.
The second issue also comes from this coupling of the datamodel. Imagine that B wants to rename the identifier of the car because it makes more sense from a domain point of view. In this case, A should also be updated to handle the new column name... So, to avoid disturbing A's team, B's developers will start duplicating the information in a different column since they can't change the existing name... Of course, A will say that they will plan this changes in the future to avoid having 2 columns containing the same data, but we all know this will most probably never be achieved...
Things get even uglier when applications are not just reading data from the same source, but they also modify them! In this case, who is the owner of the data? Who should be trusted? How can the integrity of the data be guaranteed? This is already difficult when several parts of the same applications are modifying the same information, and this becomes much worse when several applications are involved...
The last case I've seen is 2 applications sharing the same data structure to store information about 2 relatively close business objects, but with just enough differences to make understanding which data belongs to which application really hard. In this case, both applications were using this table to model financial market executions, but with different levels of aggregation. Nothing indicated that there were 2 types of data in this table, so we had to look in another table (owned by the second application) to identify the lines generated by each applications... Each new developer having to work on this table would inevitably fall in the same pit as every of their predecessorq and use incorrect (sensible) data, with all the risks involved for the company.
Not every company can develop the system to handle all its business usecases. In fact, in many cases, this would just be reinventing the wheel, since these usecases are common to many companies and so you can easily find software already supporting them on the market.
So, buying the product is often cheaper than building it. But of course, the software you just bought can't integrate with that other piece of software you also use, so you need to develop a connector between 2 (proprietary, most of the time) applications. You will probably build your own tools to handle specific part of business, and since this expensive software you've bought already has a convenient model, you'll be tempted to just use its database and add your informations to its own tables...
A few years pass, dozens of developers or teams do the same, and then you're stuck: you just can't use another software if its editor closes, or if the product is no longer supported, or if another new product suits your needs better. In some cases, you can even have technical dependencies on an external software. If the editor of the solution wants you to use each version of the language/framework/server/whatever, then you don't own the architecture of your own system. If they want to sell you a new version to provide you a feature that you absolutely need, but if this version implies a change on the technical requirements, you'll be forced to update all your technical stack to align with their recommendations. I've been there, this is not a forced migration that you want to face often...
I've worked on a project where the editor of the software we were using didn't want to develop new features for all their clients, because it became too complicated for them to handle concurrent modifications and several current versions (each client having a specific version with features only them wanted). So, they decided to sell us a Software Development Kit (SDK) so that we can implement our own features. Of course, they didn't provide much documentation about how to do it, and moreover we had to use their business entities, which we needed to decompile to understand their structures since we had neither the sources nor the documentation... The simplest feature would take days to implement, and it was barely testable since everything was very complicated and introduced scripting langages no one knew about in the team to an already complicated stack...
Remember the early 2000s and the joy of using Enterprise Java Beans (EJB) to handle remote calls between applications in your information system. At this time, this may have looked like a good idea. Sharing your codebase with other teams to avoid duplication seems ok too. Yes, every teams were forced to deliver their applications at the same time to make sure there was no broken binary dependencies, but these were fun evenings, eating pizzas with colleagues while waiting for the 2 hours delivery process to be completed, isn't it?
Well, in fact it wasn't that fun. And being unable to refactor a single class in your own codebase because someone in the company liked your code and decided to use it in their untested application isn't a pleasure neither.
Once you realize the mess that these early decisions caused, the effort required to decouple your application from the rest of the world is overwhelming. It litteraly take years to cut down your project into different components so that other applications won't be able to use your core domain, your client or your cache mechanism anymore, to remove every use of external classes that are tight coupling to other projects, to replace all EJB calls with REST APIs... But the reward for everyone involved in the project is huge: easier development and testing, faster delivery process since there is no need to synchronize with everyone else anymore, better separation of concerns in your own code, easier dependency management, no more issues of transitive dependencies because your are importing a ton of other applications' dependencies in your classpath... These expensive changes are really a life saver for the team, and they would have been much cheaper to implement at the dawn of the project!
This problem may be the one you're most unlikely to face, but this can still happen and this is the worst case scenario, since it cumulates several of the previous issues. In fact, I've faced this issue in one of the first project I've worked on in my career.
When I arrived on the project, I was told this was a total rewrite of the company system and that the project had just started 2 months ago. So, when I saw a complex webapplication with a full adminstration module, a complex business feature already implemented and a mature framework to help developing other modules, I was surprised. I quickly learned that all this stuff has mostly not been developed by the team: it was decided to reuse the framework developed by another company inside the group to avoid starting from scratch. The problem is that this framework had not been isolated from the project it was developed for. So, our team just got an archive containing all the source code of the other company's project, including their business code, which had nothing in common with our own business. Even worse, we've also inherited from their database schema and data...
As a newcomer in the team, it was difficult to know what code was related to the framework, to our project and to the other company's business. The team wanted to clean this mess, but many attempts ended with severe regressions because of dependencies between parts of the code (I can't talk about modules since there was only one!), and of course there was no automated tests at all. Moreover, we had to abandon the idea of using a different application server because there was code specific to the one used by the other company everywhere in the system, making this migration too expensive for our small team.
At some point, we wanted to add some nice features to the framework, but we were told this had already been done in the other company. So, we were asked to merge our current version with the current version of the other company... The team managed to avoid this nightmare by just cherry picking a part of the new feature, but it was still way more complex and rich than what we needed...
We managed to finish this project, but the quality of our project was a real pain. At least 40% of the code and the database contents was useless, and it never became a priority to clean this dead code. I hope the team has finally the occasion to isolate their own code since I left the team !
Putting a bit of your business logic in a rule management system is a common practice. This is for instance useful when some of your business rules need to be updated frequently but your monolithic application's delivery process requires long testing phase before being able to validate a release candidate, making it impossible to adjust some of your "volatile" rules. Eventhough I prefer that all the domain rules to be located in the code, I can understand that sometimes a rule management system can help.
But I've faced a case where almost ALL the business logic was located in a rule management system, with sometimes rules being generated from an Excel file! Moreover, rules were not supposed to change very often, since the project was basically an ETL batch. The Java project behind all this was just made of technical details about the batch framework and raw read/write from source and target systems, with absolutely no reference to the domain.
As a consequence, all the rules were written in a specific language that nobody really mastered in the team, was hard to write (our IDEs didn't handle it) and almost impossible to debug or test. When a new rule or a change to an existing one was requested, most developers in the teams just copied/pasted an existing rule, leading to whole identical files except one specific change (often, it was the field on which the rule applied).
If this already seems troubling, there was absolutely no clue in each rule about its purpose. Rules were named Rule1, Rule2 with more than 100 of them! And each rule was basically checks and assignment on hard coded values without any business term. Even the name of the project didn't explain the purpose of the whole ETL.
As Uncle Bob explains in his book "Clean Architecture", when thinking the architecture of a project, some decisions must be postponed until we really need to make a choice unless we can't continue to add value to our product (like choosing a database for instance). Other decisions must be taken really early, do not wait until it gets ugly. Fortunately, this kind of critical decision can easily be spotted, because they are what can be called architectural smells: when you think about it, they can only be bad ideas that will come back and haunt you at one point or another. Unfortunately, when working on legacy software, this kind of burden is often burried deep in the code, making them very expensive to eliminate.
We shouldn't be afraid. Yes, cleaning years or even decades of mess is not an easy task, but as software professional, we just can't let it continue to rot and kill the developers' motivation and the trust our users put into our product and our capacity to deliver business value to them.
Of course, each of the architectural burdens I described can be solved in many ways, so there is no silver bullet to resolve each issue. But I'm sure that every team can come up with propositions to finally be free of their burden. So, let's face our issues together and start cleaning this mess!