Originally published on my personal blog.
A load balancer becomes a very important component of your infrastructure as your system grows. As the traffic to your server increases, you’ll have to scale the server either vertically or horizontally. There’s nothing much you can do when you scale vertically, because the load your server can take is still limited by the hardware capability of the server.
But when you scale horizontally, you’ll have more hardware to take up more load. And it is important to scale to decrease the latency of the response and increase the availability of your service. But horizontal scaling brings its own problems. How to distribute the requests coming into your system from the frontend (client facing app) to all the servers? Well, this is where load balancers come in.
A load balancer, usually sitting between the frontend and the backend of your system, receives all the requests coming in from the frontend. It then decides which server should be allocated for serving this request. But it is not as simple as it sounds. There’s a lot more about load balancers. But because this is not a detailed guide and because I’m not an expert in this, I’ll put it very simply, the way I understand it.
Fine, but how will the load balancer decide which server should get the new request? That’s the most interesting part, at least to me. There are many strategies that could be used in a load balancer to distribute the load. Some of them are as follows:
- Random selection
- Round robin selection
- Weighted round robin selection
- Service based selection
- IP hashing selection
- Server load based selection
- A mix of all
But how will the load balancer know where my servers are? That’s a good question. In most cases, whenever you add or remove a new server from the cluster of your servers, you’ll have to manually tell the load balancer that you’ve added or removed a server. Usually, this is a list of IP addresses of the servers that is saved in the configuration of the load balancer. But there are some advanced load balancers that could “discover” servers by themselves, or let servers register themselves with the load balancer. But we’re not going to look at that in this post, mostly because I’ve not worked on any such system. I’ve only worked with load balancers that need manual configuration.
There are many open source, easy-to-use load balancers that you can try out yourself to get started. One very simple way I personally test a load balancer is by running it on my local machine to balance the load between two or more services. How to do that? You can run a simple API service on two different ports on your local machine. Or run the service in multiple Docker containers, so the IP addresses of the services are different.
Once you have these services running simultaneously on your local machine, you can start a load balancer, configure the IP addresses of the two Docker containers, and start sending it requests from your browser or from a tool such as Postman. This is the most easiest way to do it. You can then check the logs of all your services to make sure that the load is being distributed.
You can start with a simple load balancer. To be frank, you can create a very simple load balancer with Nginx. There are many
vhost configurations available online for this. This was one of the first things I tried years ago. It does work, but I don’t really remember how I did this. And I wouldn’t recommend this approach for a production setup. This is just for you to test out locally to get an idea of what this looks like.
I thought I’ll explain the different approaches of load balancing (that I listed earlier) in this post. But this post already is long enough. I’ll write a second part of this post to explain briefly how each strategy or approach of load balancing works. Look out for that.