Ryan Thelin for Educative

Posted on Jun 2, 2021 • Originally published at educative.io

Top 5 Distributed System Design Patterns

#distributedsystems #systems #design #beginners

Distributed applications are a staple of the modern software development industry. They're pivotal to cloud storage services and allow web applications of massive scale to stay reactive. As programmers build these systems, they need fundamental building blocks they can use as a starting point and to communicate in a shared vocabulary.

This is where distributed system design patterns become invaluable. While sometimes overused, design patterns are a key skill recruiters are looking for and are essential to stand out in advanced system design interviews.

Today, we'll explore 5 of the top distributed system design patterns to help you learn their advantages, disadvantages, and when to use them.

Here’s what we’ll cover today:

What is a distributed system design pattern?
1. Command and Query Responsibility Segregation
2. Two-Phase Commit
3. Saga
4. Replicated Load-Balanced Services
5. Sharded Services
What to learn next

Ace advanced system design questions with ease

Practice top distributed systems questions and design patterns to use in your next system design interview.

Grokking the Advanced System Design Interview

What is a distributed system design pattern?

Design patterns are tried and tested ways of building systems that each fit a particular use case. They're not implementations but rather are abstract ways of structuring a system. Most design patterns have been developed and updated over years by many different developers, meaning they're often very efficient starting points.

Design patterns are building blocks that allow programmers to pull from existing knowledge rather than starting from scratch with every system. They also create a set of standard models for system design that help other developers see how their projects can interface with a given system.

Creational design patterns provide a baseline when building new objects. Structural patterns define the overall structure of a solution. Behavioral patterns describe objects and how they communicate with each other.

Distributed system design patterns are design patterns used when developing distributed systems, which are essentially collections of computers and data centers that act as one computer for the end-user. These distributed design patterns outline a software architecture for how different nodes communicate with each other, which nodes handle each task, and the process flow for different tasks.

These patterns are widely used when designing the distributed system architecture of large-scale cloud computing and scalable microservice software systems.

Types of distributed design patterns

Most distributed design patterns fall into one of three categories based on the functionality they work with.

Object communication: Describes the messaging protocols and permissions for different components of the system to communicate.
Security: Handles confidentiality, integrity, and availability concerns to ensure the system is secure from unauthorized access.
Event-driven: Patterns that describe the production, detection, consumption, and response to system events.

1. Command and Query Responsibility Segregation (CQRS)

The CQRS pattern focuses on separating the read and write operations of a distributed system to increase scalability and security. This model uses commands to write data to persistent storage and queries to locate and fetch the data.

These are handled by a command center, which receives requests from users. The command center then fetches the data and makes any necessary modifications, saves the data, and notifies the read service. The read service then updates the read model to show the change to the user.

Advantages:

Reduces system complexity by delegating tasks
Enforces a clear separation between business logic and validation
Helps categorize processes by their job
Reduces the number of unexpected changes to shared data
Reduces the number of entities that have modifying access to data

Disadvantages:

Requires constant back-and-forth communication between command a read-models
Can cause increased latency when sending high throughput queries
No means to communicate between service processes

Use Case

CQRS is best for data-intensive applications like SQL or noSQL database management systems. It's also helpful for data-heavy microservice architectures. It's great for handling stateful applications because the writer/reader distinction helps with immutable states.

2. Two-Phase Commit (2PC)

2PC is similar to CQRS in its transactional approach and reliance on a central command, but partitions are processed by their type and what stage of completion they're on. The two phases are the Prepare phase, in which the central control tells the services to prepare the data, and the Commit phase, which signals the service to send the prepared data.

All services in a 2PC system are locked by default, meaning they cannot send data. While locked, services complete the Prepare stage so they're ready to send once unlocked. The coordinator unlocks services one-by-one and requests its data. If the service is not ready to submit its data, the coordinator moves onto another service. Once all prepared data has been sent, all services unlock to await new tasks from the coordinator.

2PC essentially ensures that only one service can operate at a time, which makes the process more resistant and consistent than CQRS.

Advantages

Consistent and resistant to errors due to lack of concurrent requests
Scalable, can handle big data pools as easily as it can handle data from a single machine
Allows for isolation and data sharing at the same time

Disadvantages

Not fault-tolerant, prone to bottlenecks and blocking due to its synchronous nature
Requires more resources than other design patterns

Use Case

2PC is best for distributed systems that deal with high-stakes transaction operations that favor accuracy over resource efficiency. It is resistant to error and easy to track mistakes when they occur, even at scale.

Keep learning about distributed systems.

Knowledge of distributed systems is a top priority for modern recruiters. Prepare for your next system design interview with hands-on practice and insider tips. Educative's text-based courses let you get the experience you need to land your next job on the first try.

Grokking the Advanced System Design Interview

3. Saga

Saga is an asynchronous pattern that does not use a central controller and instead communicates entirely between services. This overcomes some of the disadvantages of the previously covered synchronous patterns.

Saga uses Event Bus to allow services to communicate with each other in a microservice system. The bus sends and receives requests between services and each participating service creates a local transaction. The participating services then each emit an event for other services to receive. Other services all listen for events. The first service to receive the event will perform the required action. If that service fails to complete the action, it's sent to other services.

This structure is similar to the 2PC design in that services are cycled if one cannot complete a task. However, Saga removes the central control element to better manage the flow and reduce the number of back-and-forth communication required.

Advantages

Individual services can handle much longer transactions
Great for the distributed system due to decentralization
Reduces bottlenecks thanks to peer-to-peer communication between services

Disadvantages

Asynchronous autonomy makes it difficult to track which services are doing individual tasks
Difficult to debug due to complex orchestration
Less service isolation than previous patterns

Use Case

Saga's decentralized approach is great for scalable serverless functions that handle many parallel requests at once. AWS uses Saga-based designs in many functions like step and lambda functions.

4. Replicated Load-Balanced Services (RLBS)

The RLBS pattern is the simplest and most commonly used design pattern. At the most basic level, it consists of multiple identical services that all report to a central load balancer. Each service is capable of handling tasks and can replicate if they fail. The load balancer receives requests from the end-user and distributes them to the services either using a round-robin fashion or sometimes a more complex routing algorithm.

The duplicate services ensures the application maintains a high availability for user requests and can redistribute work if one instance of the service should fail.

RLBS is often used with Azure Kubernetes, which is an open-source container orchestration technology made by Microsoft that offers automatic service scaling based on workflow.

Advantages

Consistent performance from the view of the end-user
Can quickly recover from failed services
Highly scalable with more services
Excellent for concurrency

Disadvantages

Inconsistent performance based on load balancer algorithm
Resource intensive to manage services

Use Case

RLBS is great for front-facing systems that have inconsistent workloads throughout the day but must maintain low latency, such as entertainment web apps like Netflix or Amazon Prime.

5. Sharded Services

An alternative to replica-based designs is to create a selection of services that each only completes a certain kind of request. This is called "sharding" because you split the request flow into multiple unequal sections. For example, you may have one shard service that accepts all caching requests and another that only handles high-priority requests. The load balancer evaluates each request when it comes in and distributes it to the appropriate shard for completion.

Sharded services are normally used for building stateful services because the size of the state is often too large for a single stateless container. Sharding lets you scale the individual shard to meet the size of the state.

Sharded services also allow you to handle high-priority requests faster. Shards dedicated to high-priority requests are always available to handle such requests the moment they come in rather than being placed in the queue.

Advantages

Allows you to scale shards for common requests
Easy to prioritize requests
Simple to debug due to natural sorting

Disadvantages

Can be resource-intensive to maintain many shards
Leads to loss in performance if shards are used disproportionately

Use Case

Sharded services are best when your system receives a predictable imbalance in request types but some requests have priority.

What to learn next

Distributed system design patterns are an essential part of any successful back-end system. However, these are just a few of the patterns used by professional software engineers.

Some patterns for you to learn next are:

Sidecar Pattern
Write-ahead Log
Split-Brain Pattern
Hinted Handoff
Read Repair

To help you master these advanced system design patterns, Educative has created the course Grokking the Advanced System Design Interview. This course walks you through the top advanced SDI questions with in-depth explanations and hands-on practice. You'll learn about all the fundamental design patterns an advanced systems engineer needs. By the end, you'll have real-world experience with all the design patterns tested most by top industry recruiters.

Happy learning!

Continue learning about distributed systems

Top comments (1)

Pieter Humphrey • Jun 4 '21

I was wondering what Python people are doing in distributed apps, I'm a java guy, this intro to distributed pyhton apps is very helpful thanks for the link.