Jayaprasanna Roddam

Posted on Oct 6

System design: Load Balancing

Load balancing is one of the key pillars of modern distributed systems, playing a critical role in ensuring system scalability, reliability, and fault tolerance. It distributes incoming network traffic across multiple servers, preventing any single server from being overwhelmed by too many requests. In this chapter, we will explore different scaling approaches, types of load balancers, common load balancing algorithms, and techniques for traffic distribution and failover.

Horizontal vs. Vertical Scaling

Before diving into load balancing, it’s essential to understand the two primary strategies for scaling an application: horizontal scaling and vertical scaling. Both approaches aim to handle increased traffic and ensure that the system can meet growing demand.

Horizontal Scaling (Scaling Out)

Definition: Horizontal scaling, also known as scaling out, involves adding more machines (nodes or servers) to your existing infrastructure. Each new server shares the load, reducing the burden on any individual server.
Practical Example: Consider an e-commerce website that sees increased traffic during sales events. The website might add more web servers to handle the influx of users. Each server is responsible for processing a fraction of the traffic, allowing the system to maintain fast response times.
Advantages:
- Improved Fault Tolerance: If one server fails, the load balancer can redirect traffic to other servers, preventing downtime.
- Elasticity: You can dynamically add or remove servers based on traffic patterns, making horizontal scaling ideal for cloud environments.
Challenges:
- State Management: Stateless applications are easier to scale horizontally because there is no dependency on the server handling the requests. For stateful applications, managing session data across multiple servers becomes more complex.
- Network Overhead: Horizontal scaling can introduce communication overhead between servers, particularly when they need to share data.

Vertical Scaling (Scaling Up)

Definition: Vertical scaling involves upgrading the existing server with more powerful hardware, such as adding more CPU, memory, or storage to handle more traffic.
Practical Example: In the case of a database server that is experiencing performance bottlenecks, you might upgrade its hardware to include faster processors and more RAM to improve query performance.
Advantages:
- Simplicity: Scaling up is often easier to implement than scaling out because it doesn’t require additional servers or complex load balancing configurations.
- No Need for Data Distribution: Unlike horizontal scaling, you don’t need to worry about partitioning or distributing data across multiple nodes.
Challenges:
- Limits to Scalability: There is a physical limit to how much hardware you can add to a single machine. At some point, you will need to switch to horizontal scaling.
- Single Point of Failure: If the server crashes or experiences issues, the entire system could go down, leading to potential downtime.

In practice, modern applications often rely on horizontal scaling due to the inherent limitations of vertical scaling. Horizontal scaling is more elastic and cost-effective, especially in cloud environments, where it’s easier to add or remove servers on demand.

Types of Load Balancers

Load balancers are critical components in horizontally scaled systems. They distribute incoming traffic across multiple servers to ensure no single server is overwhelmed. There are different types of load balancers, each designed to operate at various levels of the network stack.

1. Hardware Load Balancers

Definition: Hardware load balancers are physical devices specifically built to distribute network traffic across multiple servers.
Examples: Companies like F5 Networks and Citrix provide hardware load balancers that sit at the network edge, managing traffic efficiently.
Advantages:
- High Performance: Hardware load balancers are optimized for speed and can handle high throughput.
- Security Features: Many hardware load balancers come with built-in security features like DDoS protection, SSL termination, and web application firewalls.
Challenges:
- Cost: Hardware load balancers are expensive to purchase and maintain, making them less suitable for small to medium-sized businesses.
- Limited Flexibility: Unlike software-based load balancers, hardware load balancers are not easily scalable or customizable.

2. Software Load Balancers

Definition: Software load balancers run on standard servers and can be deployed in on-premise or cloud environments.
Examples: Nginx, HAProxy, and Envoy are popular open-source software load balancers used in modern web applications.
Advantages:
- Cost-Effective: Software load balancers are often free (open-source) or available at a lower cost compared to hardware load balancers.
- Scalability: Software load balancers can be deployed across multiple servers and scaled dynamically in cloud environments.
- Flexibility: They offer better customization options for routing traffic and handling specific workloads.
Challenges:
- Performance: While highly efficient, software load balancers might not handle the same level of throughput as specialized hardware load balancers.

3. DNS-Based Load Balancers

Definition: DNS-based load balancing distributes traffic by resolving DNS queries to different IP addresses (each corresponding to a different server).
Example: Amazon Route 53 provides DNS-based load balancing as part of its cloud services.
Advantages:
- Global Traffic Distribution: DNS-based load balancers are ideal for distributing traffic across servers in different geographical regions (e.g., Europe, Asia, the US).
- Simplicity: No need for complex load balancing hardware or software; DNS routing provides a simple way to direct users to the closest or least loaded server.
Challenges:
- Slow Failover: DNS-based load balancing is slower to detect and fail over to a new server in the event of server failure.
- DNS Caching: DNS queries are cached by client devices or DNS resolvers, so changes in DNS records may take time to propagate, leading to potential traffic misdirection.

Load Balancing Algorithms

Load balancers use algorithms to determine how incoming traffic is distributed across servers. Different algorithms have different trade-offs, and the choice of algorithm depends on the application’s architecture, traffic patterns, and server configuration.

1. Round Robin

Description: In the round-robin algorithm, the load balancer sends each new request to the next server in line. Once it reaches the last server, it starts again with the first.
Use Case: Round-robin is commonly used when all servers in the pool have roughly equal processing power and workload capacity.
Practical Example: Imagine a cluster of three servers. The first request is sent to Server A, the second request to Server B, the third to Server C, and then the fourth request goes back to Server A.
Advantages:
- Simplicity: Easy to implement and understand.
- Fair Distribution: Requests are distributed evenly, assuming all servers have similar capacities.
Challenges:
- Uneven Load: If some servers are slower or have less capacity, round-robin can lead to inefficient use of resources, as the load isn’t adjusted based on server performance.

2. Least Connections

Description: This algorithm sends new traffic to the server with the fewest active connections, ensuring that servers with less load receive more traffic.
Use Case: Least connections is ideal for systems where request processing time varies significantly. It ensures that servers with shorter queues receive new traffic, preventing overloaded servers from handling more requests.
Practical Example: If Server A has 10 active connections, Server B has 5, and Server C has 3, the next request will be directed to Server C.
Advantages:
- Load Efficiency: Servers with less load receive more requests, helping balance the overall system more effectively than round-robin.
Challenges:
- Connection Tracking: The load balancer must track active connections across all servers, which can add some overhead in high-traffic systems.

3. Weighted Round Robin

Description: Similar to round-robin, but each server is assigned a weight, representing its processing capacity. Servers with higher weights receive more traffic.
Use Case: Weighted round-robin is ideal for environments where servers have different performance capacities.
Practical Example: If Server A is twice as powerful as Server B, the load balancer assigns a higher weight to Server A, ensuring it handles more requests.
Advantages:
- Balanced Traffic Distribution: Servers with more capacity get more traffic, making better use of resources.
Challenges:
- Complex Configuration: Assigning accurate weights requires understanding each server’s capacity and workload, making it harder to set up than simple round-robin.

4. IP Hash

Description: The IP address of the incoming request is hashed to determine which server will handle the request. This ensures that requests from the same client (with the same IP address) are consistently directed to the same server.
Use Case: IP hash is useful when maintaining session affinity is important, ensuring that all requests from the same client go to the same server.
Practical Example: An online game server might use IP hash to ensure that a player’s data is always handled by the same server, avoiding the need to re-fetch data on every request.
Advantages:
- Session Affinity: Ensures that the same client always connects to the same server.
Challenges:
- Imbalanced Load: If one client generates significantly more traffic than

others, it could overload a single server.

Traffic Distribution and Failover

In addition to distributing traffic, load balancers must handle failover scenarios to ensure that the system remains available even if one or more servers go down.

1. Active-Active vs. Active-Passive Failover

Active-Active: In an active-active setup, all servers are actively handling traffic, and when one server fails, the remaining servers pick up its traffic. This setup ensures maximum resource utilization and high availability.
Active-Passive: In an active-passive setup, only the active servers handle traffic, while passive servers remain on standby. If an active server fails, a passive server takes over. This setup can be slower to respond to failures but offers a simpler architecture.

2. Health Checks

Load balancers routinely perform health checks to verify whether a server is operational. If a server fails a health check, the load balancer stops routing traffic to it and redirects it to healthy servers. Health checks can include ping, HTTP status code verification, or custom logic based on your application’s behavior.

In conclusion, load balancing is a vital aspect of system design that enables applications to scale horizontally, distribute traffic effectively, and remain fault-tolerant. By understanding the different types of load balancers and the algorithms they use, you can optimize your application to handle increased traffic loads, provide redundancy, and ensure a smooth user experience even in the face of hardware or software failures.

DEV Community