Designing a Distributed System

jeastham1993 profile image James Eastham ・7 min read

If you haven't already, I'd highly recommend checking out part 1 of this series for a bit of background

So you have a broad specification for a system you need to build, but really what happens next?

Developer James of a year or two ago would have jumped straight in and got to writing a code. "I need a service to work out a fixture list for 20 football teams, cool let's get coding".

What older and more worldy wise me has learnt, is that this is a terrible approach.

Things don't fit together correctly, parts of the app don't communicate properly. These same problems occurred in the days of the monolith, with a distributed app you have zero chance of getting it right.

If you have many many years experience maybe the chance of getting it right isn't quite zero. But there is no denying good design gives a much better starting point.

I tend to stick to the same framework whenever I'm designing any application, one that starts with the first class citizens of the micro service world.


Messages are the most important part of any distributed system. Period!

Most services will need to communicate with one another, and it's these communications that add most of the complication.

The code within a service is largely simple (as long as your familiar with the language). But taking, what traditionally is just a method call in the same code base, and making that happen across a network is a different matter.

So this is where I start, what are the messages that this system will generate.

Name Description Synchronous (S) or Asynchronous (A)
fixturelist.generate Request a fixture list for the coming season to be generated A
info.fixturecompleted A specific fixture has been completed A
info.playertransferredin Notification that a new player has been trasferred into a team A
info.playertransferredout Notification that a new player has been trasferred out of a team A
info.seasoncompleted Notification that indicates the final fixture in a season has been completed A
info.teamcreated Notification that a new team has been added A
info.teamupdated Notification that a team has been updated A
info.fixturelistgenerated Notification that the fixture list for a specific team has been generated A
leaguetable.updatedata Update data in the league table data store (team names etc) S
leaguetable.updateresults A request to update the league table (normally called after a weeks games have completed) A
leaguetable.updatestats A request to update the league stats (normally called after a weeks games have completed) A
player.add Add a new player to a team S
player.delete Delete a player from a team S
player.update Update a players details against a team S
result.create Create a new fixture result A
sponsor.create Create a new sponsor S
sponsor.distribute Begin the distribution of sponsorships A
sponsor.update Update an existing sponsor S
team.create Create a new team S
team.relegate Relegate a team from the league A
team.update Update a team in the league S
transfer.in Transfer a player in S
transfer.out Transfer a player out S

A big ol' table of stuff there I know. You probably don't need to take every single record in the table in if I'm honest. But understanding the key concepts is important.

All I am covering, is every eventuality I can think of within the system. What are all the endpoints/events I will need to cover to meet my system requirements.

The synchronous/asynchronous column is rather simple. Does the caller care about a response.

For example, it is likely that the team.create message will be created from an external request of some kind (REST, gRPC etc). It is very beneficial for the caller to know the result of the request. In this instance, that the team has been successfully created.

However, the info.seasoncompleted message is a different kettle of fish. When that event is raised, the service raising the event doesn't necessarily need to care what happens next.

The service raising the event lets the world know that the season is completed, what happens next is irrelevant.

A really important point! This list is not exhaustive OR set in stone. I expect the end result to differ, maybe considerably. However having a base gives more focused dev initially instead of blindly writing lines and lines of code.

Message Flows

Most messages don't work in isolation. In almost all cases, one message will set off a chain of events that make up the system functionality as a whole.

Take the info.seasoncompleted event. That won't be raised for no reasons. It's likely something will happen before that to cause the event itself to be raised, and there will be a whole chain of things going on to handle that event.

Name Message Flow
New Team Added 1. team.create 2. info.teamcreated
Team Updated 1. team.update 2. info.teamupdated 3. leaguetable.updatedata
Fixture list generation 1. fixturelist.generate 2. info.fixturelistgenerated
Season completed 1. leaguetable.updateresults 2. info.seasoncompleted 3. sponsor.distribute 4. team.relegate
Result Added 1. result.create 2. info.fixturecompleted 3. leaguetable.updateresults 4. leaguetable.updatestats
New Sponsor Created 1. sponsor.create
Player transferred in 1. transfer.in 2. info.playertransferredin
Player transferred out 1. transfer.out 2. info.playertransferredout
Sponsor updated 1. sponsor.update

As you can see in the slightly smaller table. The season completed flow consists of a number of different parts.

  1. leaguetable.updateresults The start of this flow will be an update in a result, namely the last result of the season.
  2. info.seasoncompleted The last result of the season triggers a season completed event.
  3. sponsor.distribute Sponsor payouts will be distributed to the requisite teams based on league position.
  4. team.relegate Teams will be relegated from the league based on their finishing position.

That is the chain of functionality that makes up the end of season close down. Similar flows happen for all the functional parts of the application.


By this point, some logical groupings will hopefully have reared their heads.

Although micro services should be small components, having lots of tiny and extremely chatty services can be detrimental. What we are trying to group here are the contexts of the application.

I used to refer to this section of design as services, however I found that point of view to be limiting to the idea of a single process.

I prefer the term context. To take words from Eric Evans' mouth "A bounded context is a defined part of software where particular terms, definitions and rules apply in a consistent way".

In our league example, sponsorship's definitely sit out on their own. They only interact with seasons completed events and not really a lot else. This is it's own context.

Some of the lines are a lot more blurred (teams, players and fixtures for example).

I always try to balance this based on how much components NEED to know about each other. So a team will hold data on it's home stadium, the players in the team etc.. All the fixture generator cares about though is the name of team (maybe not even that much, maybe just an internal primary key).

That again, feels like a logical split. The fixtures can be generated from a set of unique id's referring back to each team in the team database.

Name Description Sends Receives
front Handles external HTTP requests and sits behind a load balancer
identity Handles authentication and identity management info.teamcreated // info.teamupdated // info.teamrelegated //
team-manager Handles all activities around the teams themselves including index data and player management info.teamcreated // info.teamupdated // transfer.in // transfer.out // team.create // team.update // player.add // player.update // player.delete //
sponsor-manager Handles all storage, distribution and management of sponsorship deals info.seasoncompleted // sponsor.create // sponsor.update // sponsor.distribute //
fixture-manager Handles scheduling and storage of fixtures and results info.seasoncompleted // info.resultcompleted // info.fixturelistgenerated // leaguetable.updatedata // team.relegate // fixturelist.generate // result.create // leaguetable.updateresults //
transfer-manager Manages transfers into and out of the league info.playertransferredin // info.playertransferredout // transfer.in // transfer.out //
stats-manager Handles the storage of the league table and statistics info.resultcompleted // leaguetable.updatedata // leaguetable.updatestats //

There are a couple of extra services in there (front, identity) that don't fit directly relate to our functionality, but are components I tend to include in all design docs.

When designing services, I always include a reference to which messages each service works with.

I even go as far as to check off each message as I work through to ensure none are forgotten.

There are a couple of contentious splits in the design here (transfers and stats being the key ones). They could quite easily have been grouped into the team and fixture managers, I have my reasons for splitting each out.


Due to the fact transfers could handle both between teams in the league, but also into and out of the league it felt like it should sit externally. The transfer service will mostly be called by the team service, but handling and storing player movement seemed different enough to be stored separately.


Purely a performance reason here. Although initially, I will be storing very simple statistical data. If the required stats expand storing them in a report ready format will be a huge performance gain.

Statistics like the leagues top goalscorer could be live calculated based on each result. However, storing a table of data that is incremented respectively after each result is added is much more performant.

Architecture Decision Records

A quick note on architecture decision records (ADR's). ADR's are an idea proposed by Michael Nygard back in 2011 http://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions.

ADR's are small snippets of documentation that cover architecturally significant decisions made during the development process.

Long design docs are hard to understand, complicated and almost instantly out of date.

Keeping a log of decisions as they happen, in numerically ordered files allows a new developer to quickly and easily see why things are how they are. At least at a high level anyway.

Want to know why NoSQL was chosen over a relational database, there will be an ADR detailing the context, the decision and the consequences (both good and bad).

Throughout this series, I will refer to specific ADR documents where relevant. You can also view the complete set on github.

So now we have a basic system design, I think it might finally be time to write some code.



Editor guide