I'm currently refactoring a really messy codebase at work. It has made me think about the reasons about why it's messy. Last night, while watching an episode of Stranger Things, I realized why it was so awful.
The reason this codebase is bad is simple: there is no clear step by step process of the state of the data flowing through it. There many things happening at once. The density of each line of code is very high. Global state is also used and abused. Understanding the status of the data at a given point in time is difficult. It requires detective work. Lots of it.
If we compare the old code to the refactor, the one thing that stands out is that the data clearly flows from one point to another. You can clearly understand how the input is transformed into the output. Just like how assembly lines work.
Wikipedia defines an assembly line as:
... a manufacturing process in which parts (usually interchangeable parts) are added as the semi-finished assembly moves from workstation to workstation where the parts are added in sequence until the final assembly is produced...
Using that definition we can make a connection that good code mimics an assembly line. It breaks down processes into steps (workstations) and things get added or removed in sequence until the output is produced. I don't know about you, but DRY, dependency injection, and a whole lot of other design patterns immediately came to mind.
The other thing that annoys me about this codebase is the fact that every step is, at best, O(n). Accessing items on a hash table uses loops. Strings are built by exploding other strings. Just downright inefficient. There wasn't any thought of getting things done with the least amount of work. Good code is lazy. This one is as overworked as a fast food worker (I was one).
In contrast, the refactored code always aims for O(1). It doesn't always achieve it, but a lot of thought has been given to the data structures used. You see, the trick to performance is to always think about what data structure works best for this particular process. You also need to think in terms of the content of the data structures themselves. This messy code maintains every piece of data on global state. The refactored code keeps things local.
What does state have to do with performance? Let's use the assembly line example again. Imagine that you are building a car, and it's time to install the doors. Instead of the doors being next to you in your workstation, you have to walk all the way to the back of the factory to get them. That's not all. When you get there the doors are not sorted by color or side. You have to manually go through each door to find the one that fits. That's global state. Having one door for each side of the car in the correct color at the correct moment in the assembly process greatly improves efficiency. Your job is now simply to bolt the doors. Six bolts in total. One O(1) operation.
When I say that an assembly line aims for a constant O(1), what I mean is that each step is broken down to the point where the operations do not require a lot of processing. Each of the previous steps in the assembly line breaks down everything in smaller steps and state is always kept local.
I know there are a lot of people who are against the term software engineering. I understand why. There really isn't a by-the-books approach to it, yet. The industry is still too young for anything to be highly formalized. However, we can learn a lot from manufacturing. Studying how assembly lines work these days will provide a lot of insight. I suggest you look into it.