Scalability can refer to many different parameters of a system: how much additional traffic can it handle, how easy is it to add more storage capacity, or even how many more transactions can be processed. A service is scalable if it results in increased performance in a manner proportional to the added resources. As I continue to learn about system design, I wanted to cover some of the principles of scalability in this article. The outline of this article is based on my notes after watching the Scalability Lecture from Harvard that I have linked in the conclusion.
Simply put, vertical scaling usually involves adding more power (CPU, RAM, etc) to your existing machines in order to improve performance. This solution is not great in the long term though since there is a limit to how advanced technology can get or even just how much you can afford.
Horizontal scaling generally involves adding machines into your pool of resources. I like to think of horizontal scaling as quantity over quality and vertical scaling as quality over quantity, each with their own benefits and drawbacks.
A cache is a simple key-value store and it should reside as a buffering layer between your application and your data storage. Whenever your application has to read data it should at first try to retrieve the data from your cache. The benefit of a cache is that it’s super fast. It holds every dataset in RAM and requests are handled as fast as technically possible. The downside of caching is that it can become redundant and be difficult to update.
Public servers of a scalable web service are hidden behind a load balancer. The load balancer is responsible for evenly distributing requests from users onto your application servers according to the chosen algorithm. I like to think of the load balancer as the middleman between your client and your servers. Two common types of load balancers are: hardware load balancers which can be very expensive but are generally very reliable and software load balancers which are cost effective and easy to scale but generally don't perform as well as the hardware load balancer.
Database replication is the frequent electronic copying of data from a database in one computer or server to a database in another so that all users share the same level of information. The result is a distributed database in which users can access relevant data without interfering with others. The implementation of database replication for the purpose of eliminating data ambiguity or inconsistency among users is known as normalization.
With database partitioning, there will generally be different servers dedicated to different categories in order to balance load. By splitting a large server table into smaller, individual tables, queries that access only a fraction of the data and can run faster because there is less data to sift through.
This article only grazes the surface of what scalability is and what should be considered when designing a scalable system. If you found this article interesting, be sure to check out these great resources that I used during my own research.