This article was originally posted as a tutorial on the Vaadin website. Since I am moving my future DDD writing to this platform, I'm copying my original tutorial series as well to keep the timeline straight.
Domain-Driven Design (DDD) has been around since Eric Evans published his book about the subject in 2003. I myself came into contact with DDD some years ago when I joined a project that suffered from data consistency problems. Duplicates showed up in the database, some information was not saved at all, and you could run into optimistic locking errors anywhere and anytime. We managed to solve this by using the building blocks of tactical domain-driven design.
I have learned more about domain-driven design since then and also to try to use it in my projects wherever appropriate. However, during the past years when I have talked to other developers, many of them have heard about the term domain-driven design, but they do not know what it means. In this article series, I am going to give a brief introduction to domain-driven design as I see and understand it. The content is very much based on the books Domain-Driven Design: Tackling Complexity in the Heart of Software by Eric Evans and Implementing Domain-Driven Design by Vaughn Vernon. However, I have tried to explain everything in my own words and also inject my own thoughts, opinions, and experiences.
You will not become an expert in domain-driven design by reading my article series, but I hope it inspires you to read more about it elsewhere. I also highly encourage you to read the books by Evans and Vernon.
Now let us get started with the first subject, namely strategic domain-driven design.
If I look up the word domain in the Dictionary app on my MacBook, I get the following definition:
[A]n area of territory owned or controlled by a particular > ruler or government...
- a specified sphere of activity or knowledge...
- Apple Dictionary
In the case of domain-driven design, it is the second part of the definition that we are interested in. In this case, activity is whatever an organization does and knowledge is how the organization does it. We are also going to add the environment in which the organization conducts its activities to the domain concept.
The domain concept is very broad and abstract. To make it more concrete and tangible, it makes sense to split it up into smaller parts called subdomains. Finding these subdomains is not always an easy thing to do, and if you get them wrong, you can run into trouble down the road when the pieces in your puzzle all of a sudden do not fit well together.
Before you go looking for subdomains, you should think about the subdomain categories. All subdomains can be divided into three categories:
- Core domains
- Supporting subdomains
- Generic subdomains
Not only will these categories help you to find your subdomains, they will also help you to prioritize your development efforts.
A core domain is what makes an organization special and different from other organizations. An organization cannot succeed (or even exist) without being exceptionally good in their core domain. Because the core domain is so important, it should receive the highest priority, the biggest effort and the best developers. For smaller domains you may only identify a single core domain, larger domains may have more than one. You should be prepared to implement the features of the core domain from scratch.
A supporting subdomain is a subdomain that is necessary for the organization to succeed, but it does not fall into the core domain category. It is not generic either because it still requires some level of specialization for the organization in question. You may be able to start with an existing solution and tweak it or extend it to your specific needs.
A generic subdomain is a subdomain that does not contain anything special to the organization but is still needed for the overall solution to work. You can save a lot of time and work by trying to use off-the-shelf software for your generic subdomains. A typical example would be user identity management.
It is worth noting that the same subdomain can fall into different categories depending on what the organization does. For a company that specializes in identity management, identity management is a core domain. However, for a company that specializes in customer relations management, identity management is a generic subdomain.
Finally, it is worth pointing out that all subdomains are important to the overall solution regardless of the category in which they fall. They do, however, require different amounts of effort and may also have different requirements of quality and completeness.
Let's say we are building an EMR (Electronic Medical Records) system for smaller clinics. We have identified the following subdomains:
- Patient Records for managing patient medical records (personal information, medical history, etc.).
- Lab for ordering lab tests and managing test results.
- Scheduling for scheduling appointments.
- File Archive for storing and managing files that are attached to the patient records (such as different documents, X-ray pictures, scanned paper documents).
- Identity Management for making sure the right people have access to the right information.
Now, how would we classify these subdomains? The most obvious ones are file archive and identity management which are clearly generic subdomains. But what about the others? That depends on what is making this particular EMR system stand out among the others in the market.
Since we are building an EMR system, it is pretty safe to assume that patient records is a core domain. If we are going to take the market by making a system that makes all the clinics work more efficiently through clever and innovative scheduling, then scheduling is probably also a core domain. Otherwise, it is a supporting subdomain, maybe built on top of some existing scheduling engine. The same reasoning can be applied to the lab subdomain: if a significant part of our business case is a seamless integration between patient records and the lab, then lab is most likely a core domain. Otherwise, it is a supporting subdomain.
You sometimes find the domain referred to as the "problem domain". This comes from the fact that the domain defines problems that the software is going to solve (after all, there is a reason why the software is being made in the first place). Vaughn Vernon splits a domain up into a problem space and a solution space. The problem space concentrates on what business problems we are trying to solve. The subdomains belong to this space.
The solution space concentrates on how the problems in the problem space are going to be solved. It is more concrete, more technical and contains more details. So how are we going to transform our problems into solutions?
To be able to create software for a domain, you need a way of describing the domain. Having a relational data model or something similar is not enough. You need to be able to describe not only things and their relations but also the dynamics such as events, processes, business invariants, how things change over time, and so on. You need to be able to discuss and reason about the domain with both your fellow developers and also the domain experts. What you need is a ubiquitous language.
The ubiquitous language is a language that is consistently used by both domain experts and developers to describe and discuss the domain. Apart from the code itself, this language is the most important deliverable of a domain-driven design process. A big part of the language is domain terminology already being used by domain experts, but you may also need to invent new concepts and processes in cooperation with the domain experts. Because of this, active participation from the domain experts is absolutely essential for domain driven design to succeed. If the customer is not interested in putting in the time and effort to teach your their domain and help you create a ubiquitous language, you should either try to convince the customer to change their mind or pick another design method.
You can document the ubiquitous language in various ways. A good starting point is to create a glossary of terms. Business processes can be described graphically using e.g. swimlane diagrams and flow charts. UML can be used to describe the relationship between things and state diagrams to describe how state changes as different things move through different processes. The subdomains are also a part of the ubiquitous language, and you may even need to define different "dialects" of the language for different subdomains. This embodiment of the ubiquitous language is the domain model, and it will eventually be transformed into working code. In other words, the domain model is not the same as a data model or a UML class diagram.
The ubiquitous language has a nice feature, and that is that it tells you whether you are on the right track or not. If you can easily explain a business concept or process using the language, it means you are on the right track. If you, on the other hand, find your self struggling to explain something, you are most likely missing something from the language and thereby also from your domain model. When this happens, you should grab a domain expert and go looking for the missing pieces. You may even stumble upon a revelation that turns your existing model completely upside-down and results in a far superior domain model than the one you had before.
In a perfect world, there would be only one ubiquitous language and one model that would explain everything about a single domain. Unfortunately, this is not the case, save for very small and simple domains. Business processes may overlap or even conflict. The same word may mean different things or different words may mean the same thing in different contexts. There may be (and often are) more than one way to solve a problem in the problem space, depending on how you view it.
Instead of trying to find the Big Unified Model, we choose to accept the facts and instead introduce something called bounded contexts. A bounded context is a distinct part of the domain in which a particular subset or dialect of the ubiquitous language is consistent at all times. In other words, we are applying divide and conquer and splitting the domain model up into smaller, more or less independent models with clearly defined boundaries. Every bounded context has its own name and this name is a part of the ubiquitous language.
There is not necessarily a one-to-one mapping between bounded contexts and subdomains. Since a bounded context belongs to the solution space and a subdomain to the problem space, you should think about the bounded context as one alternative solution among many possible solutions. Thus a single subdomain can contain multiple bounded contexts. You may also find yourself in a situation where a single bounded context spans multiple subdomains. There is no rule against this, but it is an indication that you may need to rethink your subdomains or context boundaries.
Personally, I like to think about bounded contexts as separate systems (e.g. separate executable JARs or deployable WARs in the Java world). A perfect real-world example of this is micro-services, where each micro-service can be considered its own bounded context. However, this does not mean you have to implement all your bounded contexts as micro-services. A bounded context could also be a separate subsystem inside a single monolithic system.
Let's revisit the EMR system in the previous example and more specifically the patient records core domain. What kind of bounded contexts could we find there? Now I am no expert on health care software so I will just make up some, but hopefully, you will get the idea.
The system supports services for both doctor appointments and physiotherapy. In addition, there is a separate onboarding process for new patients where they are interviewed, photographed and given an initial assessment. This leads to the following bounded contexts within the core domain:
- Personal information for managing the patient's personal information (name, address, financial information, medical background, etc).
- Onboarding for introducing new patients into the system.
- Medical Exams used by doctors when examining and treating the patient.
- Physiotherapy used by physiotherapists when examining and treating the patient.
In a very simple system, you probably would squeeze everything into a single context, but this EMR is more advanced and provides streamlined and optimized features for each type of service that is provided. However, we are still within the same core subdomain.
In a non-trivial system, very few (if any) bounded contexts are completely independent. Most contexts will have some kind of relationship with other contexts. Identifying these relationships is of importance not only technically (how will the systems technically communitate with each other) but also to how they are developed (how will the teams that develop the systems communicate with each other).
The simplest way to identify relationships between bounded contexts is to classify the contexts as upstream contexts and downstream contexts. Think of the contexts as cities next to a river. The cities upstream dump stuff into the river, which reaches the cities downstream. Some of the stuff is essential to the downstream cities, and so they retrieve it from the river. Other stuff is harmful and can do direct damage to the downstream cities ("sh*t rolls downhill").
Being upstream or downstream has its pros and cons. An upstream context does not depend on any other contexts, which in a way makes it free to evolve in any direction. However, the consequences of any changes may be severe in downstream contexts and this may, in turn, impose restrictions on the upstream context. A downstream context is restricted by its dependency on an upstream context but does not need to worry about breaking other contexts further downstream, which in a way gives the developers of the downstream context freer hands than the developers of the upstream context.
You can describe the relationships graphically by using a dependency diagram where arrows point from the downstream contexts to the upstream contexts, or by using the U and D roles.
Finally keep in mind that a context can be both an upstream context and a downstream context at the same time, depending on where you stand.
Once we know what our contexts are and how they are related, we have to decide how to integrate them. This involves several important questions:
- Where are the context boundaries?
- How are the contexts going to communicate technically?
- How are we going to map between the contexts' domain models (i.e. how are we translating from one ubiquitous language to another)?
- How are we going to guard against unwanted or problematic changes occurring upstream?
- How are we going to avoid causing trouble for downstream contexts?
The answers to these questions will be compiled into a context map. The context map can be documented graphically like this:
To make it easier to create the context map, there are a set of ready-made integration patterns that work for most use cases. Depending on which integration pattern you pick, you may have to add additional information to the context map to make it really useful.
The teams of both contexts cooperate. The interfaces - whatever they are - evolve so that they accommodate the development needs of both contexts. Interdependent features are properly planned and scheduled so that they cause as little harm as possible to both teams.
Both contexts share a common code base which is the kernel. The kernel can be modified by any of the teams, but not without consulting the other team first. To make sure no unintended side effects are introduced, continuous integration (with automatic testing) is required. To keep things as simple as possible, the shared kernel should be kept as small as possible. If a lot of model code ends up in the shared kernel, it may be a sign that the contexts should, in fact, be merged into one big context.
The contexts are in an upstream-downstream relationship, and this relationship is formalized such that the upstream team is the supplier and the downstream team is the customer. Thus, even though both teams can work more or less independently on their systems, the upstream team (supplier) is required to take the downstream team's (customer) needs into account.
The contexts are in an upstream-downstream relationship. However, the upstream team has no motivation to accommodate the downstream team's needs (it may be ordered as a service from a larger supplier, for example). The downstream team decides to conform to the model of the upstream team, whatever it happens to be.
The contexts are in an upstream-downstream relationship and the upstream team does not care about the downstream team's needs. However, instead of conforming to the upstream model, the downstream team decides to create an abstraction layer that protects the downstream context from changes in the upstream context. This anticorruption layer lets the downstream teamwork with a domain model that suits their needs the most, while still integrating with the upstream context. When the upstream context changes, the anticorruption layer must also change, but the rest of the downstream context can remain unchanged. It may be a good idea to combine this strategy with continuous integration where automated tests are used to detect changes in the upstream interface.
Access to a system is provided by clearly defined services, using a clearly defined protocol. The protocol is open so that anybody who needs to can integrate with the system. Web services and micro-services are a good example of this integration pattern. This pattern is different from the others in that it does not care about the relationship between the contexts and the teams that develop them. You may end up combining the open host service pattern with any of the other patterns.
The key when using this pattern is to keep the protocol simple and stable. You most of the systems clients should be able to get what they need from this protocol. Create special integration points for the remaining special cases.
This is the integration pattern I personally find it the most difficult to properly explain. The way I look at it, the published language is the closest relative to the open host service and is often used together with that integration pattern. A documented language (for example based on XML) is used for the input and output of the system. There is no need to use a particular library or a particular implementation of a spec as long as you conform to the published language. Real world examples of published languages are MathML for representing mathematical formulas and GML for representing geographical features in geographical information systems.
Please note that you do not necessarily need to use web services together with a published language. You could also have a setup where a file is dropped into a directory and processed by a batch job that stores the output in another file.
This integration pattern is special in that it does not perform any integration at all. Still, it is an important pattern to keep in the toolbox and may end up saving a lot of money and time. When the benefit of the integration between two contexts is no longer worth the effort, it is better to cut the contexts loose from each other and let them evolve independently. The reason for this could be that the systems have simply evolved to a point where they are no longer related. The (few) services provided by the upstream context that the downstream context actually used are re-implemented inside the downstream context.
I believe strategic domain-driven design was originally meant for larger projects but I think you can benefit from it also in smaller projects - even if you end up not using any other parts of DDD in the project.
For me personally, the major takeaways from strategic domain-driven design are the following:
It introduces boundaries. Scope creep is a constant factor in all of my hobby projects. Eventually, they become more exhaustive than fun to work on or completely unrealistic to finish alone. When working on customer projects, I have to work hard not to cause technical scope creep by overthinking or overengineering things. Boundaries - wherever they are - help me to divide the project into smaller parts and focus on the right ones at the right time.
It does not require me to find a super-model that works in all cases. It recognizes that in the real world, there are often many smaller models in more or less clearly defined contexts. Instead of breaking these models, it embraces them.
It helps you to think about the value your system is going to bring and where you should put the most of your efforts to get the biggest value. I have personal experience from projects where properly identifying and then concentrating on the core domain would have made a huge difference. Unfortunately, I had not yet heard about strategic DDD and both time and money were wasted.
I also know myself good enough to identify risks with this approach: finding subdomains and bounded contexts for the sake of finding subdomains and bounded contexts. When I learn something new that I like, I very much want to try it out in the real world. That may sometimes mean I go looking for things that are not there. My suggestion here is to always start with one core domain and one bounded context. I do the domain modeling carefully, additional subdomains and bounded contexts will eventually reveal themselves if they exist.