Almost all systems will undergo some kind of growth spurt during the course of their lives, and every successful system will likely have multiple periods of intense — and sometimes unexpected — growth.
We are already familiar with the term for this kind of growth: scalability. When systems have to grow to accommodate a larger number of users or resources, they have to scale in size. The trouble is, of course, that growing a system that is built in a certain way isn’t always a simple or easy task; various problems inevitably rear their ugly heads, and we must contend with them! So far, we’ve talked about what scaling is, why we should do it, and what makes it hard. But we have yet to actually get to some real solutions to the scalability problem.
Well, that’s about to change: it’s time to move from problems to solutions! We finally have enough baseline knowledge to start digging into some of the techniques that are used to scale a system in a sustainable way. So let’s get right to it.
When people talk about scaling a system, they are often referring to a specific way of growing the system to help it accommodate its burgeoning size. As we know from previous posts, one of the things that makes scaling hard is dealing with a system that has limited capabilities in what it is able to do (for example, having a single server that causes a bottleneck when it is inundated with high traffic or an unusually large number of requests). Sometimes, if a system is small and contained enough, the solution is actually quite simple: we could just increase the capacity of our system, thereby reducing its limitations.
For example, if we are running a small e-commerce store and expect a high number of requests on a certain day — maybe on Black Friday — we might know that we’ll need to increase our system’s memory, or “provision” (which just means add) another server within a cluster of servers to give additional processing power so that we can handle a higher number of users/resources. In these kinds of situations, it might just be enough to improve upon the limitations of our system for a given period of time; in other words, we can just increase our system’s capacity by improving on the setup that we already have. The idea of adding on capabilities to a pre-existing resource in our system is known as scaling up , also called scaling vertically. When we scale “up”, we’re improving upon the node(s) that already exist in our system and making them more capable of handling more traffic, more data, or more processes.
However, the vertical scaling solution doesn’t really suffice in many situations; sometimes, provisioning a larger database, increasing the size of a server, or throwing more money at an otherwise centralized system isn’t the best tool for the job. And this is where another form of scaling comes into play: scaling out , or scaling horizontally. Instead of choosing to increase the capacity of our system by improving on the nodes we already have, we can choose to add more nodes into our system to help expand the capabilities of the system.
We’ll be focusing on horizontal scaling in this post, since that format is one that can get super complicated and has more details to dive into (which makes it the more interesting of the two, in my opinion!).
When it comes to scaling out, there are three main strategies to keep in mind in order to avoid the pitfalls of growing in the various dimensions of scalability (size, geographical, and administrative) that exist.
- First, we can try to scale out while ensuring that we hide communication latencies , or the delay that often occurs (to varying degrees) when one node in the system attempts to communicate with another.
- Second, we can try to scale out by taking pieces of our system and splitting up what they do and where they live into different, disparate pieces, which is known as partitioning and distribution.
- Finally, we can scale out by replicating our resources and making multiple copies of them in order to help support parts of the system that may fail or that may be too far away or busy to respond in an instant.
Let’s delve a little deeper into what each of these three techniques looks like in practice when scaling a system horizontally.
Hiding communication latency
When it comes to scaling a system horizontally, we know that its going to involve adding more nodes (machines, servers, what have you) to the mix. But these nodes could potentially be far away from one another, especially if we’re operating within a wide-area network (WAN)— or at least far enough to cause some delays in how much time it takes for one machine to “talk” or communicate with another.
For example, let’s say that we add a second machine to handle an influx of requests. This machine might take some extra time to come up with a response, or there might be some issues with our network itself that are beyond our control now that we’ve added this second node. One server might request some data or a service from this remote machine, but if that machine takes time to reply…well, our server can’t do much but wait around for it. And it could be waiting for quite some time (multiple seconds or even more!). This is exactly the kind of issue that causes communication latency, so it is our job to find a way to hide these delays when we scale out and add that second machine.
Enter asynchronous communication to the rescue! If you have worked on web applications, you may already be familiar with asynchronous requests; the concept is particularly interesting when it comes to distributed systems because it solves a big scalability problem. Asynchronous communication is the idea that one resource can send a request to another resource in the system, but doesn’t have to await a response from the other resource. Instead, it can do something else while the other resource takes however much time it needs in order to send a response, which is also known as a “non-blocking” request.
Asynchronous communication has its own counterpart called — you guessed it — synchronous communication! Synchronous communication occurs when one resource has to await a reply from another resource before it can do anything else, which effectively means that the request is “blocking” and the resource has to wait for a reply. Generally speaking the term client is used to refer to the resource that is doing the requesting, while server is used to refer to the resource that is sending a response; usually, the server implements some service, processes some information, or retrieves some data to hand back to the client.
When we use asynchronous communication within a distributed system, we gain two major benefits:
- we get back precious time from the client that would have been spent awaiting a response from the server, and
- we can hide whenever the server is taking a long time to reply — whether that is due to a network problem, or a far away or overloaded server issue.
These two benefits are particularly useful because they also tie into the ultimate goal of creating a transparent distributed system, which we already know is a good thing to strive towards!
If we elegantly handle the time between requesting and receiving a response, we can achieve transparency, and basically mask any delays in our system from our end user.
In particular, we can see async communication at work in a systems that process data in batches or in parallel, since the client can request some information and do some other work in parallel.
We can also see async communication at play in systems where a client can spawn multiple threads at once. In this situation, the client will halt the thread that is requesting something from the server, but because it has multiple threads, the entire process isn’t blocked, which means the rest of the threads in the process can continue doing their work while the one thread making the request patiently waits for a reply. We see an example of this in the image above: Client A is capable of doing some work in parallel while sending an async request to the server, while Client B is has one thread in its process that is blocking while it awaits a response, but its other threads can continue to be productive.
Partition and distribute
Another strategy for approaching the problem of scaling “out” is to take a portion of our system that is handled/located in one centralized place, and splitting it up and out throughout the system. This is also known as partitioning a section of our system distributing what it does across multiple resources.
This technique solves some of the problems that come up when a system’s resources are completely centralized; for example, when we partition and distribute a single service in our system, it means that the entire service no longer lives in one place, so there is less likelihood of the entire service crashing, being delayed, or failing since it is distributed across many nodes in our system.
Some famous examples of partitioned and divided services are actually the Internet Domain Name System (DNS). DNS, or the naming system that handles where to send requests to different addresses on the web (such as .com, .gov, .edu), basically splits up the job of figuring out where to route a request by having different servers handle addresses that map to different zones. The naming service is partitioned so that one server handles requests ending with .com while another handles those ending with .edu, and figures out where to route those requests to in terms of their domain names.
To us, it might seem like a single service is doing this, but this complicated task has actually been partitioned across servers, and the work of directing requests has be distributed across a much larger system, which handles every request url on the internet! The same goes for the World Wide Web, which is also partitioned and distributed across more than 100 million servers.
There is one final tactic that we can reach for when it comes to scaling a system horizontally: we can replicate our resources so that they are more readily available. We’ve run into replication once before in the context of replication transparency, which allows for multiple instances of a resource to exist at once.
Replication of resources, by definition, increases how many instances or copies of that resource are available for different parts of the system to use. The benefit of this is two fold: first, it means that the whole system is trying to rely on just one copy of the resource whenever it needs to use it; second, if one part of the system needs to fetch that resource for some reason, it doesn’t need to go far to find the original copy — it just needs to find the nearest copy of the replicated resource.
One form of replication that web developers in particular may be familiar with is caching , which occurs when various replicated copies of a resource are created to live close to the client that is accessing them. When a resource is replicated and cached, it basically means the original resource is copied and then put in a place that is easy-to-access and is physically close to the requesting client. Oftentimes, a single network may share a cached copy of a resources, which means that all of the computers within the networks will rely on the nearby cached resource.
Nothing in distributed systems is easy, not even problem solving. In fact, some of the tactics and strategies that we’ve discussed here actually have their own drawbacks to them.
For example, not all systems can really make use of asynchronous communication. For an e-commerce store that handles credit card payments, there might simply not be anything useful for a system to do while it is waiting for a charge or transaction to process and for an order to actually be placed. In those situations, it is beneficial to try to reduce how much work the server has to do in order to reduce how much time the client has to wait around; for example, if the client can do some form validations before making a request (like checking that a credit card’s expiration date is in the future), it will at least save some time while it waits for the server to process the rest of the request data.
Caching also has its downsides; when we have multiple replicas of a resource, it’s pretty easy for those copies to get out of sync, which cause consistency issues. Consistency is a tough problem to solve in distributed systems, because a system with multiple parts and multiple replicas means potential for some cached resources to become outdated or for one resource to conflict with other resources in the system. It might seem easy to just “create another copy” of a resource, but it’s not so easy; another copy means another set of problems to consider in our system!
If you’re wondering about what kinds of problems could pop up…well, don’t worry about it too much for now. We’ve got a whole year to talk about that!
Scalability tactics and solutions are interesting to learn about, if for no other reason than the fact that even the solutions that we come up with are not perfect or fool-proof. If you’d like to continue reading about them, check out the resources below!
- A brief introduction to distributed systems, Maarten van Steen & Andrew S. Tanenbaum
- Web Distribution Systems: Caching and Replication, Professor Raj Jain
- Distributed Systems for Fun and Profit, Mikito Takada
- An Introduction to Distributed Systems (Lecture Notes), Dr. Tong Lai Yu