Antonio

Posted on May 10, 2022

How to create big aggregates in DDD

#discuss #ddd #aggregate #domain

Hi! My name is Antonio and I have been reading about DDD for quite some time. I think Domain Driven Design is the right tool for some enterprise applications, so recently I have been trying to use it on my company.

Before continuing reading, I'm assuming you have a good knowledge about DDD and related concepts (sorry for not including an introduction, but I think there are already too many introductory articles about DDD, so I don't feel like writing another one).

Problem

So, what problem am I facing with DDD? Big aggregates implementation (emphasis on implementation and not design). When I say big, I do not mean they contain a lot of different entities or a lot of dependencies, but many instances of the same entity. For example, a bank account aggregate has one child entity: a transaction. Now, that bank aggregate can have hundreds or thousands instances of that entity.

Let's suppose that my company domain is about Roads and Stops (this is just an example). Both things are entities because they have an identity. In this case, Road would be the root aggregate and Stop would be a child entity of that aggregate. Let's say they have two or three fields each, it does not really matter. Here is a quick implementation of that model in Python (I have not used dataclasses and a lot of the logic is missing because it's not important for this discussion):

class Road:
    id: int
    name: str
    stops: [Stop]
    ...

class Stop:
    id: int
    latitude: int
    longitude: int
    ...

So now, you need to create a repository to retrieve those entities from storage. That's easy enough, just a couple of SQL queries or reading a file or whatever you want to choose. Let's suppose this is our repository (let's avoid interfaces, dependency injection and so on because it's not relevant in this case):


class RoadRepository:
     def get(id: int) -> Road:
         ...
     def save(road: Road) -> None:
         ...

Easy enough, right? Okay, let's continue implementing our model. The get method is really easy, but the save method has a lot of hidden complexity. Let's suppose we are using a relational database like postgres to store our entities. Let's say we have two tables: roads and stops and they have a relationship and so on.

In order to implement the save method, we would need to update all of our child entities. And that's the problem. What happens if our Road instance has 345 different stops? How do we update them? I have not a final answer for that, but I have some proposals!

Solution 1

This would be the equivalent of solving the problem by brute force: delete everything and recreate it again.

Props

Easy to implement

Cons

Not sure about the efficiency of this one. but I estimate is not that good.
If you set the unique identifiers on the database level, you are going to have a problem keeping the same identifiers.

Solution 2

Keep track of all the changes at the aggregate level. Something like this:

class Road:
    id: int
    name: str
    stops: [Stop]

    def update_stop(self, stop: Stop):
        ... some logic to update the list ...
        self._changes.append({
           'type': 'UPDATE',
           'stop': stop,
        })

Then we would read that list of changes on the repository and apply them individually (or in bulk, depending on the change type, for instance, we can group together the deletions, creations, etc.).

Props

It's more efficient than the first solution because in average requires less db operations.

Cons

Our domain has been contaminated with logic not related to the business.
A lot of code is necessary to keep track of the changes.

Time to discuss!

What do you think about this problem? Have you faced it before? Do you have any additional solutions? Please comment it and we can discuss it :)

Top comments (1)

Dev_NIX • Sep 16 '24

That's an interesting discussion! IIRC, I've managed it like this:

First of all, I used pre-generated IDs like UUID v4, so the database is not a concern.
Then, in the repository I used INSERT ... ON DUPLICATE KEY UPDATE, so I only need to care about what should not exist in the database.
Right now, I'm able to delete all the entries that are id NOT IN (...), and belong to the aggregate

But with big aggregates there are other kinds of problems, like:

Having all the data in memory that maybe you will not need, at least the vast majority of times
Concurrency issues

At that point, I would consider the idea of promoting the inner entity into an aggregate and managing certain rules through domain services. It comes with its tradeoffs.

Another idea is that maybe that model has a very broad responsibility, and part of it could be extracted into a different context.

I'm still a rookie in DDD territories, but those are my 2 cents 😄