A load balancer is software or a hardware device that distributes incoming network traffic across a group of registered targets (e.g. EC2 instances, containers, etc.)
By distributing the incoming network traffic among multiple targets we reduce the load that a single target needs to handle. This in turn increases the performance of each target and reduces the latency of a network request by allowing to process request in parallel by multiple targets.
Also, having a load balancer increases the availability of the overall system by enabling us to eliminate the single point of failure of the system that having a single target constitutes.
Another reason to use a load balancer is that it enable us to scale up or down the group of targets so that we can adapt to the demand of our system in an elastic manner.
The most typical load balancers operate on layers 4 or 7, or both, of the Open Systems Interconnection (OSI) networking model.
A layer 4 load balancer distributes incoming network traffic based on data from network and transport layer protocols. For example an AWS Network load balancer works as follows:
With Network Load Balancers, the load balancer node that receives the connection uses the following process:
- Selects a target from the target group for the default rule using a flow hash algorithm. It bases the algorithm on:
- The protocol
- The source IP address and source port
- The destination IP address and destination port
- The TCP sequence number
- Routes each individual TCP connection to a single target for the life of the connection. The TCP connections from a client have different source ports and sequence numbers, and can be routed to different targets.
When we use this type of load balancer its IP address is the only one that is advertised to clients (maybe using DNS). So, when the clients send network packets the destination IP is that of the load balancer. Therefore, when the load balancer receives a request, it has to perform Network Address Translation on the network packets by changing the destination IP address of they contain to that of the selected target, and when responding to the client it has to change the source IP address of the packets to its own.
A layer 7 load balancer allows routing decisions based on the content of the request. Hence, this type of load balancers must understand the application layer protocol of the request. The most common application layer protocols understood are HTTP and HTTPS. For example An AWS Application load balancer works as follows:
With Application Load Balancers, the load balancer node that receives the request uses the following process:
- Evaluates the listener rules in priority order to determine which rule to apply.
- Selects a target from the target group for the rule action, using the routing algorithm configured for the target group. The default routing algorithm is round robin. Routing is performed independently for each target group, even when a target is registered with multiple target groups.
For instance the listener rules mentioned in the above quote can refer to HTTP headers or to URL paths. These rules determine a target group. And from that target group a target will be selected using some algorithm. Some of the most common algorithms are described below.
Some of the most common load balancing algorithms are the following:
- Round Robin – Requests are distributed across the group of servers sequentially.
- Least Connections – A new request is sent to the server with the fewest current connections to clients. The relative computing capacity of each server is factored into determining which one has the least connections.
- Least Time – Sends requests to the server selected by a formula that combines the fastest response time and fewest active connections.
- Hash – Distributes requests based on a key you define, such as the client IP address or the request URL. The hash function can optionally be a consistent hash to minimize redistribution of loads if the set of upstream servers changes.
- IP Hash – The IP address of the client is used to determine which server receives the request.
- Random with Two Choices – Picks two servers at random and sends the request to the one that is selected by then applying the Least Connections algorithm or the Least Time algorithm.