Originally published on my personal blog.
In the part 1 of the art of load balancing, we saw what a load balancer is, the various strategies of load balancing, and how you can easily get started with a load balancer on your local machine. In this part 2 of the series, we’ll briefly see how each strategy of load balancing works. To recap, here are some of the strategies for load balancing:
- Random selection
- Round robin selection
- Weighted round robin selection
- Service based selection
- IP hashing selection
- Server load based selection
- A mix of all
Let’s look at each of them briefly.
As you can probably guess from the name, this strategy uses some sort of random selection for distributing the load across servers. There is no way of knowing how the random nature of this strategy works. Sometimes this could work exceptionally well. But more often than not, this causes problems. Especially, this could lead of unbalanced distribution of load. So some servers could get a lot of load and some servers could be getting very few requests. Because of this issue, this strategy is not that widely used.
This is also pretty simple to understand. With the round robin selection strategy, a load balancer simply iterates through the list of servers configured and allocates a server to requests sequentially. So for the first request, server1 will be selected. And for the second request, server2 will be selected. This goes on till the last server. After this, the next request will be routed to the first server again.
The problem with this is, to make sure all the servers are capable of serving all the requests they get, they all need to have the same hardware configuration. But what if not all of your servers are the same? Or what if on some servers you’re running other services as well? This is where weighted round robin comes in.
This is a bit more advanced type of round robin strategy. In this approach, you’ll assign a “weight” to each server. This way, you can specify which server is powerful, and which isn’t. The weight ranges from 0.1 to 1, where 1 signifies a powerful enough machine to handle all the requests coming in.
So when a load balancer encounters this weight, it’ll know the proportion of load that has to be allocated for a server with respect to another server. For example, let’s assume you have two servers with weights of 1 each. This means the load will be distributed equally among the two servers. But if one server has a weight of 1 and the other has a weight of 0.5, the load will be distributed in a 2:1 ratio.
This is one among the most common load balancing strategies you’ll see. But the load balancing is usually done by an API gateway of some sort. In this strategy, each logical service has its own route and load balancing. For example, suppose you’re providing dating service as an app. In this app, you’ll have a feature to login, and once you login, you’ll get a list of potential dates based on your preferences. In this setup, there will be at least two services running in the backend. One service will be responsible for authenticating you during login, and the other service will be used to get the list of potential dates.
The client will usually be calling a single service, which will usually be an API gateway of some sort, which will also be load balanced. This API gateway will send the request to one service or the other based on the endpoint of the API called. Each service will again be load balanced and scaled accordingly.
The advantage here is that based on the amount of load you’re getting for a particular service, you can decide on the number of servers you need for that service. This will allow you to optimize the number of servers you have, in turn reducing the cost for you.
This is nothing like the other strategies. In this method, the load balancer takes the hash of the IP address from which the request is coming, then uses the number of servers available to distribute the load, applies the modulus operator on the two and gets the index of the server to which the request has to go.
For example, if we have five servers on the backend ready to take up load, then the load balancer takes the hash of the request IP address, and mods that with the number five. The result will be a number between one and five, which is nothing but the index of the server to which the request will go.
If you know how a HashMap is implemented, you’ll know what I’m talking about. This is very similar to that.
As the name suggests, this kind of load balancer is able to get the details of load on each server. Also, the load balancer will be able to keep a track of the latency of each server. Based on all this information, the load balancer will select a server which has the best combination of load and latency. For this though, you’ll have to give some extra permission to your load balancer to read all this info from the servers. Also, there will be a lot more configuration that has to be done.
This is going to be a combination of all the strategies that we discussed till now. But that doesn’t mean a single layer of load balancer will be doing all this. Usually, this will be used in a system that has layered load balancer. A good example for this would be the service based load balance strategy that we discussed earlier. In such as system, the API gateway will have a load balancer, which will be using one strategy.
Next, each service will have a load balancer, which the API gateway will be calling. Next, within the service, there will be scaling and load balancing again. So, as you can see, there will be multiple layers of load balancers.
I hope that was clear and simple enough to understand. If not, please feel free to contact me in the comments below or connect with me on Twitter or LinkedIn. And again, I’m no expert in this, and I might not be a 100% correct in this. But this is what I’ve learnt so far in my career.