Generally speaking, there are 2 types of scaling:
Vertical - you keep adding resources to your single system until you run out of space - e.g. until you have no more free memory slots to expand your RAM or SATA/SAS ports to connect another hard drive. This is similar to piling up resources in one isolated place - it all grows to the roof
Horizontal - you keep adding more small autonomous systems, each having it's own portion of resources that in sum give you the power and performance you need. You can do this as long as you have physical space (and enough electricity for reliable operation) which is very efficient in regards of future growth. Think of a big shelf where you put new server next to the last one when you run out of resources - it grows
The individual systems deployed behind load balancer are called nodes.
Generally you put load balancer in front of these nodes to handle the management work. The load balancer decides how to spread the load across all nodes and how it supports the nodes and tries to offload as many tasks from them as possible.
We can sum up as:
- balancing the load across available nodes - obviously
- keeping information about health of all nodes to prevent routing requests to dead nodes. This is done by implementing regular health checks - usually HTTP probes to specially crafted API that responds with the internal state of the application. This allows e.g. for an awesome way to do maintenance - because you can mark nodes as inactive (while doing the maintenance and/or reboots) and the web application still works by serving requests from other active nodes. The same way you can slowly replace all nodes by swapping one-by-one for more powerfull ones without any downtime!
- keeping information about the load on each node (CPU load, RAM usage, number of active connections, number of connections/second to each node) to prevent further overloading already heavily used nodes
- keeps track of sessions (based on source IP address+port or HTTP cookie) to route the same client always to the same node (necessary if the remaining nodes are not session aware - imagine you log into an application which produces cookie for your authenticated session. Then your next request would be redirected to a different node (which load balancer decides e.g. is less loaded) that has no idea you were already authenticated because it has no cookie - you would end up redirected back to the login screen, possibly ending in a loop.
- offloading - load balancer can take over responsibility of compressing resources and sending them to end clients leaving more CPU power for application code on nodes. The same way it can terminate and negotiate HTTPS sessions, which is also expensive because of SSL/TLS cryptography happening for each client. Similarly, load balancer can also cache frequently used resources in memory or on a fast storage to prevent retrieving them from nodes all the time so your nodes does not have to account for more memory and fast SSDs.
- Splitting load based on routing (parts of an URL) - you can decide to dedicate 4 nodes for
/rest/apicontext, 2 nodes for
/[css|js|img]and the rest for application code.
Load balancers [can] make your service more reliable, easily scalable, more performant and resilient to outages.