Slavius

Posted on Aug 22, 2018

Explaining Load Balancers

#loadbalancer #web #http

Preface

This originally came as comment on #explainlikeimfive - Explain Load Balancers Like I'm Five but I decided to convert it to a little explaining article.

Load balancers are all about scaling

Generally speaking, there are 2 types of scaling:

Vertical - you keep adding resources to your single system until you run out of space - e.g. until you have no more free memory slots to expand your RAM or SATA/SAS ports to connect another hard drive. This is similar to piling up resources in one isolated place - it all grows to the roof vertically.
Horizontal - you keep adding more small autonomous systems, each having it's own portion of resources that in sum give you the power and performance you need. You can do this as long as you have physical space (and enough electricity for reliable operation) which is very efficient in regards of future growth. Think of a big shelf where you put new server next to the last one when you run out of resources - it grows horizontally.

Load Balancers come in play when you scale horizontally

The individual systems deployed behind load balancer are called nodes.

Generally you put load balancer in front of these nodes to handle the management work. The load balancer decides how to spread the load across all nodes and how it supports the nodes and tries to offload as many tasks from them as possible.

Features of the load balancer

We can sum up as:

balancing the load across available nodes - obviously
keeping information about health of all nodes to prevent routing requests to dead nodes. This is done by implementing regular health checks - usually HTTP probes to specially crafted API that responds with the internal state of the application. This allows e.g. for an awesome way to do maintenance - because you can mark nodes as inactive (while doing the maintenance and/or reboots) and the web application still works by serving requests from other active nodes. The same way you can slowly replace all nodes by swapping one-by-one for more powerfull ones without any downtime!
keeping information about the load on each node (CPU load, RAM usage, number of active connections, number of connections/second to each node) to prevent further overloading already heavily used nodes
keeps track of sessions (based on source IP address+port or HTTP cookie) to route the same client always to the same node (necessary if the remaining nodes are not session aware - imagine you log into an application which produces cookie for your authenticated session. Then your next request would be redirected to a different node (which load balancer decides e.g. is less loaded) that has no idea you were already authenticated because it has no cookie - you would end up redirected back to the login screen, possibly ending in a loop.
offloading - load balancer can take over responsibility of compressing resources and sending them to end clients leaving more CPU power for application code on nodes. The same way it can terminate and negotiate HTTPS sessions, which is also expensive because of SSL/TLS cryptography happening for each client. Similarly, load balancer can also cache frequently used resources in memory or on a fast storage to prevent retrieving them from nodes all the time so your nodes does not have to account for more memory and fast SSDs.

Additional things you can do with load balancers

Splitting load based on routing (parts of an URL) - you can decide to dedicate 4 nodes for /rest/api context, 2 nodes for /[css|js|img] and the rest for application code.

TL;DR

Load balancers [can] make your service more reliable, easily scalable, more performant and resilient to outages.

Top comments (8)

Josh Cheek • Aug 23 '18

Hi, thanks for the post! Some Qs, if you don't mind:

If all traffic goes through the load balancer, then can it, itself get overloaded? Does the load balancer have to scale vertically? What does Google do, for example?

The session thing confused me, I thought the session cookie was based on the host and I figured all the nodes would have the same host. If session data is then looked up, then I assume it would be on a shared resource. If the request needs to be handled by the same machine then it seems like that would prohibit swapping out the nodes, eg, for upgrades or w/e.

Does the load balancer add latency?

Does the load balancer look like any other server externally or is it sitting somehow outside the normal request cycle? Eg it feels like maybe it operates at the DNS level.

Is the load balancer a single point of failure?

Slavius • Aug 24 '18 • Edited

Hi,

Is the load balancer a single point of failure?

If implemented incorrectly, yes it can become SPOF. Please read on to find how to mitigate this problem.

If all traffic goes through the load balancer, then can it, itself get overloaded? Does the load balancer have to scale vertically? What does Google do, for example?

Of course. Although compared to application server that processes all-purpose application code - a load balancer has limited features, knows almost fully the full domain of its responsibilities and for this purpose contains acceleration chips to help with individual tasks (network processing, SSL/TLS encryption, data compression).

The session thing confused me, I thought the session cookie was based on the host ...

When you visit a web application login form you already get a cookie even though you're not yet authenticated. This is for the application server to be able to match your request with your next form submission, e.g. to match server-side generated captcha with your response in the next POST request. Session cookies are application generated, in some cases handled by the application server itself (like Tomcat has context.xml configuration file dedicated to session context configuration).
If you'd like all nodes to be session aware, you would have to back your sessions in a shared store - like a database. This however, proved to be very difficult to implement without further problems and it introduces additional latency (every single request sends/requests a cookie, even the ones for resources) and it consumes DB server's resources. It can also overload the DB server creating DoS situation - imagine a botnet creating millions of new session by sending requests as small as few bytes; which usually is:

GET /login HTTP/1.1
Host: servername.tld

Btw, you can try this on your own with telnet. Fire up telnet to a webserver's port 80 and type this in followed by 2 empty lines. Note: the Host: line is not required if the server serves one site per IP!

If the request needs to be handled by the same machine then it seems like that would prohibit swapping out the nodes, eg, for upgrades or w/e.

To do a maintenance you set state of required nodes on the load balancer as inactive (not accepting new requests), you wait until the count of active sessions to these nodes drop to zero and then you're free to do your maintenance. After you're done you simply re-enable them on the load balancer. Of course if the node suddenly dies it will take a while for the load balancer to realize this and disable new connections to it and of course all active sessions will be handled by different node resulting in requirement for clients to re-login (if the session is not kept in database).

Does the load balancer add latency

Yes and no. It adds latency in terms of additional point of processing, however if it manages to offload tasks from application servers on backend nodes, serves some items from cache, accelerates SSL/TLS session initialization or just serve a request to a node that processes it quicker because it has the lowest load - in the end load balancer improves the overall latency - given you have done everything right. Misconfigured load balancer usually does the opposite.

Does the load balancer look like any other server externally or is it sitting somehow outside the normal request cycle? Eg it feels like maybe it operates at the DNS level.

Load balancer usually sits either in the perimeter of application servers but in it's own isolated network (DMZ). It is then isolated by a firewall.
It can also be placed in the same network as all the nodes but it should use a dedicated network card to communicate with them (for performance reasons one NIC is used for data from outside and the other to communicate with nodes).
Load balancer can work on multiple ISO/OSI layers. The most simple is at the TCP/IP where it has access to IP + port + connection state information only. This is very dumb kind of balancing as you don't understand higher level protocols and your only way of finding if the node is up is to do TCP three-way handshake. It is mostly used with SMTP/POP/IMAP protocols but it is very fast. Often implemented in haproxy.
Then you may balance on HTTP/HTTPS level. Here you understand what's going on and if you also terminate the HTTPS you can read the contents of individual streams. This allows you to compress responses or send cached items. It also allows you to do routing and limiting and session awarness.
These are often called (web application) proxies and not load balancers.
Examples may be: Nginx, IIS with ARR, Apache.

Is the load balancer a single point of failure?

Sure it is when operated alone. Usual highly available setup includes 2 or more load balancers running in cluster in either active/active or active/passive configuration. To further increase the availability you can have 2 different Internet Service Providers (or geo distributed datacenters) each running a pair of clustered load balancers. Then you configure DNS A record resolving to 2 distinct public IP addresses which guarantees round-robin processing splitting DNS requests evenly (CloudFlare is very fast and reliable at this). There's also possibility to return IP address of datacenter closest to your originating geo location by using something like PowerDNS dnsdist
This is what big players do to make their services highly available.

Josh Cheek • Aug 25 '18

Of course. Although compared to application server that processes all-purpose application code - a load balancer has limited features, knows almost fully the full domain of its responsibilities and for this purpose contains acceleration chips to help with individual tasks (network processing, SSL/TLS encryption, data compression).

Nice.

To do a maintenance you set state of required nodes on the load balancer as inactive (not accepting new requests), you wait until the count of active sessions to these nodes drop to zero and then you're free to do your maintenance.

I guess it feels like it's at odds with the bullet point that begins "keeps track of sessions"

Then you configure DNS A record resolving to 2 distinct public IP addresses which guarantees round-robin processing splitting DNS requests evenly (CloudFlare is very fast and reliable at this). There's also possibility to return IP address of datacenter closest to your originating geo location by using something like PowerDNS dnsdist
This is what big players do to make their services highly available.

Ahh, nice, that's what I was missing!

Followup Q: Does the load balancer somehow pass the socket on to the node it's chosen to handle the request (some IO syscall, presumably) or does it return a redirect to tell the client which node to talk to?

Slavius • Aug 25 '18

Q: Does the load balancer somehow pass the socket on to the node it's chosen to handle the request (some IO syscall, presumably) or does it return a redirect to tell the client which node to talk to?

The load balancer handles establishing full session towards the client and at the same time a session towards the node. So basically it has to maintain 2 sockets for each connection. It has to when it wants to alter the conenction, like handle SSL/TLS towards the client and HTTP towards the nodes or HTTP/2 towards clients and HTTP/1.1 towards nodes, etc.
For this reason can a load balancer return HTTP 502 or 504 error codes to the client when a node does not respond within preconfigured interval or just it shows a custom error page ("Sorry for the inconvenience, try again later").

Nawinkmr • Jul 16 '19

Hi Slavius,
Nice explanation and of course re-explanation. I am a bit confused how does it form a HTTP request to the nodes. In this case, I assume that the load balancer receives the https request from client, resolves the SSL/TSL and then send the HTTP request to port 80. In this HTTP packet, what does it send the source IP and port to the node(s). Does it propagate the IP+Port of the client to the nodes or hide them at its own level?
If hides, is there any way to let the nodes know the identity of original requester.
~Nawin

Slavius • Aug 9 '19

Hi Nawinkmr,

there is no official HTTP protocol extension to send this information to the nodes, however a very common way is to add new HTTP headers like X-Forwarded-For, X-Forwarded-Host, X-Forwarded-Proto, X-Real-IP and X-Client-IP as this information is very often vital on the nodes. Nodes then have to understand this on an application level. More in Nginx resources here: nginx.com/resources/wiki/start/top...

Hiram • Jul 8 '20

Slavius, how can I know how many nodes my load balancer is able to handle? I mean I've been taking metrics of usage and for example CPU is barely spiking 20% with 11 node. I guess I have plenty of capacity for more nodes?

But which should be the rule of thumb for this escenarios? thanks!

Slavius • Jul 14 '20

The best is to do a load testing. Generally what you're looking for is error-free operation with acceptable latency. Use something like Apache jMeter and try increasing the load while adding nodes. Your target is all connections are handled gracefully by the load balancer while keeping the response time from nodes in reasonable values. I'm afraid there's no golden rule here.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.