DEV Community

Cover image for Structuring Modular Monoliths
Carlos Gándara
Carlos Gándara

Posted on • Edited on

Structuring Modular Monoliths

A monolith is a self-contained software application responsible for the whole set of functionalities of a system.

It is a term with negative connotations. Not because there is an inherent flaw in it, but because of the context surrounding it. Which most of the time is that the company growth was not balanced with appropriate architectural design.

In this post we will explore a way of structuring monolith applications that avoids the most common pitfalls associated with this way of building and shipping applications.

Following there are a number of guidelines to have a healthy monolith. Even when the naming or the language used seems to refer to absolutes, the value is in what we want to achieve with each decision. There are many ways to get there and here we just describe one.

The ugly monolith

Monoliths are often associated to the (un)architectural pattern named big ball of mud. The usual stinky stuff we find there is:

  • Everything is accessed by everyone, everywhere.
  • Persistence is shared across all the monolith.
  • A trend to have god objects and over-abstractions.
  • The framework is everywhere, there is no isolation from it or for other vendor libraries.

Although it's true a distributed system would mitigate some of these problems, it won't be guaranteed and for sure not for free. There is little correlation between these disturbing circumstances and shipping our application as a one or many units. A messed up monolith is a messenger. Instead of blaming the messenger, we could better try to understand the message and get out of the mud.

How? Structuring our monolith it in a way that fosters separation of responsibilities and defines clear boundaries between the different domains it includes. Monoliths can be nice.

The modular monolith

A domain is "an area of knowledge or activity".

We call module the aggregation of all the code that takes care of a certain domain -or a group of domains- within our monolith.

In a modular monolith:

  • Modules are the high level organizational units. Although they live withing the same monolith, we manage them as if they were isolated applications.
  • They have limited visibility of each other. We cannot call arbitrary parts of a module internal implementation, avoiding the high coupling this implies.
  • Each module has its own persistence layer. So changes in a module data schema do not directly affect others.
  • A module is ideally own by a single team. Otherwise, Conway's Law will manifest and the need to communicate between the teams will result in a module sneakily split into two.
  • Modules communicate via direct code calls, controlled by each module, or messaging. Both ways are explicit. Each module controls its own boundaries.

Big ball of mud vs modular monolith patterns

For instance, in an e-commerce application we could have domains like Product Catalog, Order Handling, Payments, Shipping, etc. Sometimes, depending on the concrete case, we will have a module for each of them, or we may decide it's a good idea to group together Order Handling and Shipping if they are closely related.

The Product in the Product Catalog module will not be polluted by Order Handling related actions. Order Handling will have its own independent Product model.

To query product data, the Product Catalog module will expose a client with a defined contract for doing so (code call communication). To inform a new product was created, the Product Catalog module will publish a message in a shared bus (messaging communication) so other modules can react to it.

Which modules?

Figuring out the right modules is no easy task.

We must take our time for doing domain discovery and come up with a reasonable structure of modules. Chances are they won't be right in the first attempt. Software is an ever-changing thing, and we will need to rethink and adjust our modules to reflect that. Which is fine. The intention of this post is not to cover that specific aspect, though.

Techniques like Event Storming, Domain Storytelling, or Context Mapping can help with the domain discovery and identifying our modules.

High level organization

This is how the high level structure of a modular monolith could look like (the numbers in the left means belonging to the same organizational element):



#1 .github/
#1 .docker/
#2 docs/
#3 src/
#3    Module1/
#3    Module2/
#3        Module2A/
#3        Module2B/
#3    Module3
#4    ModuleBus/
#5    Boilerplate/


Enter fullscreen mode Exit fullscreen mode

#1 In the root we have the general operational stuff. Note we don't have shared dependencies, as they are defined at module level.

#2 There is space for transversal documentation. Besides docs on how to set up environments, pipelines, deployments, etc., this is the right place to document guidelines on the architectural styles to default to (more on this in a moment).

#3 Inside src we set a high level structure that tells us first sight what are the different activities our system covers. Nesting modules is fine for a more meaningful grouping, although there is no shared code in a parent module.

#4 #5 There are two eyebrow-raising artifacts, though. ModuleBus and Boilerplate

ModuleBus goal is to provide lightweight tooling for modules to publish messages that could be consumed by other modules, in a pub-sub fashion. All modules have access to it. More details in a moment.

Boilerplate is an intentionally terrible name for what we usually see as Common, Shared, or Utils: tools or models we don't want to repeat over and over... unless maybe we do want?

"Base" models seem a convenient and harmless thing to have. However, if there is free access to them from everywhere in the monolith they tend to attract new concrete aspects that are required by just on module. This tends to cause over-abstractions to make room for too many parties interests, and recurrent breaking changes caused by the high coupling of many moving parts to the same shared code.

The proposal here is to not allow direct usage of Boilerplate code. The shared libraries and models are there, but cannot be used directly. Instead, modules copy the shared stuff they need in their own Boilerplate namespace and use their own version, which can evolve -or not- without interfering with other modules.

Module internals

In each module we will find some common artifacts and the module implementation itself.



src/
    Module1/
        [module implementation]
        Client/
        Messaging/
        docs/
            adr/
                adr1.md
                adr2.md
                ...
        dependencies.json


Enter fullscreen mode Exit fullscreen mode

For the implementation we will do whatever makes sense, being it a clean architecture, a full framework one, or whatever else. The complexity of the domains, among other factors, will dictate it. Being modular does not mean we cannot choose the most appropriate architecture for each module.

Dependencies are defined per module as well. This allows each module to use its own toolset and not get restricted by the module that take more time to upgrade, preventing the use of more recent dependencies.

Modules have their own documentation. Remember we mentioned adding general architecture guidelines in the doc at root level? The module doc is the place to confirm they are followed or not, along with the reason for changing anything and what the change is. Architecture Decision Records are a great tool for that purpose.

In the area of common artifacts, we have the Client and Messaging namespaces, used for the communication between modules.

Talking with the outside

Since we treat each module as a separate application, when the time comes to communicate with other module we should assume it is somewhere else, like there is a network in between. Therefore, is each module defining what others modules can do and how to do it.

Code call communication

One benefit of being in the same application is that we can use direct code calls that do not rely on the network to succeed. In our proposal, a module provides with clients visible to other modules which act as the only direct entry points via code, defining the functional surface the module exposes.



src/
    Module1/
        ...
        Client/
            ClientV1.code
            ClientV1Mock.code


Enter fullscreen mode Exit fullscreen mode

In terms of visibility, a module is allowed to import and use other modules' clients. And that's the only single piece of code they can import from the other modules. Ideally a client is defined as an interface, allowing to go with a direct code call implementation or an over-the-network implementation, in case it's needed (for instance, by an actual external application).

Clients are maintained by the team owning the module, which provide with a mocked version of it as well. This way, client modules have reliable test double that is up-to-date with the client input and output schemas.

Messaging communication

The client is covering direct code calls, when a module requests something to another module. With Messaging we cover the indirect communication, when something happens in a module that could be of interest for another module, in a pub-sub fashion.



src/
    Module1/
        ...
        Messaging/
            Publishing/
                SomethingHappenedHere.code
            SubscribedTo/
                SomethingHappenedElsewhere.code
    ModuleBus/
        MessageBus.code


Enter fullscreen mode Exit fullscreen mode

The ModuleBus is the tool we provide for the modules to publish messages about their internal activity. Similarly to the clients, it defines a contract to publish messages, and it will take care of delivering it to the interested subscribers.

Modules will define which messages they publish and which messages from other modules they want to subscribe to. When a module publishes a message in the bus, it takes care to notify all the interested subscribers.

The highlight with this bus is that we can do in-memory messaging without the need to go through a message broker and the inherent complexity of async communication, while keeping the ability to be async when it makes sense. Furthermore, since from the modules' point of view there is a single contract, we can use the implementation of the ModuleBus to transition from in-memory to async in a transparent way as needed.

The persistence

Each module has its own isolated persistence system, only accessible by the module itself. As with everything there is a degree of convention involved. The isolation may consist in a totally separated database or a subset of prefixed tables in the same database.

We want to avoid shared data models that are polluted by data from unrelated domains. It's fine to have product_catalog_products and order_handling_products tables, even if both refer to "product". Product means something different in each context. The one from Order Handling should not care and should not be affected if we add a new column in the Product Catalog one.

Greenfield vs brownfield: migration strategies

So far we have described how to structure a monolith in a way it will look fantastic. This is relatively easy in a greenfield project, when building a system from scratch.

The reality is that most of the time we deal with already existing systems. Even more, our greenfield project will become a brownfield project in no time, where things are not as fantastic as we thought when we started.

Here are some strategies to deal with transitions from non-modularized to modularized monoliths. They are based on the premise that we want to be incremental, avoiding big bang releases where we replace big chunks of functionality with a fresh rewrite.

Migration events and responses

Because it's common to find big classes doing too many things, sometimes it's just not possible to replace them at once with a modularized version.

Instead, replace smaller parts of functionality with the modularized version, which will communicate back with the original functionality. Emit events or return ad-hoc responses so the original code can resume the logic not migrated yet. Even if that means breaking module visibility rules or leaking muddy monolith stuff into our pristine modules... as long as it is a temporary measure.

Shared persistence

To migrate to a modular monolith usually requires a time window when the logic in the new module shares the persistence with the older code. Because we need to support the working software, because we cannot just take over a table used in many places, or because many other reasons.

When moving some domain into a module, prepare a specific plan to deal with the database migration. The strategy here consists in not sticking to a dogmatic "modules do not share databases", but to be realistic while keeping a strong compromise to remove the persistence dependency once it's feasible.

Avoid on demand "modules"

As it has been stressed a few times already, identifying the right modules is not small task. Because sometimes we want to move fast, we may be tempted to create "modules" for everything, even when it does not match with the purpose and meaning of a module (hence the quotes). The result is an absurd number of modules, often with suspicious similarities in their name.

Do not create a "module" just for isolating something from the non-modular part of the monolith. The benefits of modularization come from how they aggregate and isolate related logic. We can improve the design of some part of our monolith within the same monolith.

Decisions are not forever

We may fail to define the right modules at first. Or we may succeed at it, but the problem space evolves and what once was right isn't anymore. It's probable that the current monolith will pollute our understanding, and we will -unconsciously or not- replicate the existing structure, which may not be ideal.

It is ok to discard or merge modules, or to shrink or expand them because we didn't put the boundaries in the right place. It's part of the learning process. Do not stick to what is there just because it's already there.

Concluding

We have covered a number of patterns to structure monolith applications, so they have solid organizational foundations:

  • Use modules as organizational units.
  • Limit the visibility a module offers to the others using contracts controlled by the module itself.
  • Provide a way for messaging communication among modules.
  • Do not share the persistence layer.

As pointed out, the value is in the purpose of the proposed patterns, not in the particular suggestions -which, in fact, are kinda naive on purpose. Still, we can mess up in each module concrete implementation. Once again let's stress how important is the domain discovery and setting the best boundaries we can.

If you have any other experience with modular monoliths -or any other feedback- it would be nice to hear about it in the comments. All good advice is welcome for the sake of monoliths around the world.

Credits

For the diagrams I've used Excalidraw, borrowing libraries from Mateusz Baran and Anumitha Apollo.

The cover picture is from the 1957 movie The Monolith Monsters which, according to Wikipedia, was the inspiration for the Tiberium mineral in the Comand & Conquer games O_o

Top comments (0)