Aggregates are one of the most misunderstood concepts in domain-driven design.
What is an aggregate? Sure, it's a pattern that's central to domain-driven design... but is it just a collection of objects?
Martin Fowler explains:
Aggregates are the basic element of transfer of data storage - you request to load or save whole aggregates. Transactions should not cross aggregate boundaries.
Those with experience in DDD might understand what that means and why it applies.
But for those starting to get familiar with aggregates, such an explanation might still be too detailed and nuanced.
Let's start looking at what an aggregate is not.
An aggregate is not:
- Just a graph of entities
- Merely a behaviour rich object
- An entity or collection of entities that you can dump into your database tables
So... what is it?
This is usually where people start talking about consistency boundaries, transactional consistency, eventual consistency, aggregate boundaries, invariants, aggregate roots, etc.
When learning about these things, it's natural to grab onto a familiar term or idea when all this jargon is thrown at us. From there, we (falsely) form an idea of what this is all about.
Let's try to keep things simple and practical.
I like to use a very simple idea to help people understand the essence of what aggregates are.
Imagine your software project was not one massive codebase - but a collection of small bubbles. Each bubble can be worked on independently. That means, you only need to think about what's in the bubble at any given moment - not the entire system all-at-once.
Aggregates are the same. They are bubbles. Just on a smaller scale.
Imagine we are building a new feature for our system. This new feature includes the concept of projects, teams and team members.
Each team can have multiple staff members associated with it.
Staff members can be a part of multiple teams.
Each team can have multiple projects.
Well, that's simple enough.
Let's add the fields that might exist on each object:
Doesn't that look just like an entity diagram? Don't the database tables scream out at you?
"Obviously", we need to have a composite table linking each team with each team member.
Just hold on.
Let's think about behaviour instead and not treat these as code objects. Let's treat them as business objects or concepts. What can these objects do?
Well, isn't it clear? You can create a team. Edit a team. Delete a team.
Obviously, deleting a team means that all the associated projects ought to cascade and be deleted too.
Note: Let's put aside the fact that these are not the real behaviours of our system. Anytime you see CRUDy language, it should be a red flag!
But wait. You just found out from your users that this won't work. The assumptions you made about the business were wrong...
There are times when projects are moved from one team to another.
There are times when projects are orphaned for a period of time.
So now, other questions arise:
- What should our model look like now?
- When we write our code, should we load the entire graph of objects into memory?
- What happens when a project is orphaned? Will the teams just have a reference to a null project object?
Now the business has a new requirement: a team member's role can change per project.
So... do we just create another composite table to match each team member with each project they are on and the role of each project?
That's what we usually do. Developers naturally think about systems in terms of database design first 🤦♂️.
Note: Yes, that's a huge problem that domain-driven design tries to help avoid!
With each new requirement, our model gets more bloated. Over time, this might consume lots of memory in our system too.
Imagine a project whose team has 500 members. Yes, these are large projects we're talking about.
We need to load all the staff members into memory, and all their data too!
That will lead to performance issues around memory usage, etc. Is there a better way?
Aggregates are what solve these kind of problems.
- Simplify our models when they start getting out of hand
- Isolate complex business rules
- Deal with performance issues when loading large object graphs into memory
- Allow flexibility to more easily deal with future unexpected business requirements
That's what aggregates are for.
Ok. But what are they?
Instead of telling you, I'll show you what one might look like in this case (otherwise, we need to start talking about consistency, transactional boundaries, concurrency, etc!).
Notice that I split the original Member model into three?
The Team and the Project have their own dedicated "version" or "view" of the Member that only has the exact data it needs to make decisions about business rules and behaviours within it's "bubble".
For example, the Member's role is not needed by the team bubble. Why keep it there when it doesn't belong?
Instead, we have split our model into two "branches". Two bubbles/aggregates, in this case.
To support being able to assign the same Member to multiple teams, we then have to create a dedicated authoritative model of a Team Member and link the other aggregate's Member entity as a foreign key-like reference (again, we aren't talking about databases).
Note: Notice I added specific Ids to the non-authoritative "Member" models (like "TeamMemberId")
There's much more to discuss. And many more improvements we could make to this model around using value objects, etc.
I think this is enough in order to help you see that aggregates are more than simply creating a graph of entities.
It's all about allowing the domain rules to guide you.
Many times, the aggregates we discover are not the aggregates we thought we would need!