DEV Community

Samuel Johnson R
Samuel Johnson R

Posted on • Originally published at Medium

Getting started with building bigger, faster and scalable systems (Part 2)

This part is going to get you started with knowing about what is scalability related to Data Systems.

Scalability:

Martin L. Abbott once said in Scalability Rules: 50 Principles for Scaling Web Sites,

Don’t accept that the application is too complex or that you release code too often as excuses that you can’t roll back. No sane pilot would take off in an airplane without the ability to land, and no sane engineer would roll code that they could not pull back off in an emergency.

Okay, so what is scalability?

Imagine, you built a software and released it for the customers and it is reliable for some days. One night after a shout-out by some celebrity made it to be viral over night and the load gets increased from 100’s of concurrent users to 100,000’s of concurrent users and keeps growing, will your system be able to process that much tons of data and will it be able to cope up with the increased load? The ability of the system to cope with such increased load is called scalability.

And yeah, It is not just “black and white” as it is meaningless to say “This system is scalable” or “This system doesn’t scale”. A proper thought should be given on it and prior discussions on how to add additional resources to handle the load increase and if the system keeps going in such a way, how are we going to cope up with it and so on.

Scalability can be understood by the following:

  • Defining the load.
  • Defining the performance.
  • How to cope performance with the load.

How can one define a load?

Load parameters or performance parameters can be used to define a load.
First of all one should know how much his system can handle and what happens if load increases crazily. The parameters to consider for defining the load depends completely on the purpose of the system.

Some of the parameters one can usually consider are:

  • Read to write ratio: A software like Facebook/Instagram where amount of read is so much greater than writing.
  • Requests per second.
  • Number of concurrent connections: Imagine a chat room where all users will be concurrently connected
  • Database querying delay.
  • Cache hit and miss. so on…

Coming to real time example. (key takeaways from Link)
By 2012 - Twitter, one of the greatest data loaded system was serving around 4600 post tweet requests per second on average and around 300,000 read tweet requests per second. The problem they were facing while writing tweet was not handling the number of requests but those tweets should be reached to all the people the user being followed by.

For Example: Now if Rihanna who has around 94Million followers in twitter, posts a tweet, then it should be reached to all the Millions of followers. That was the problem in scaling for them at first.

First they used Traditional Relational DB making classic joins and whenever a user requests his/her timeline, they query all the people that person follows timeline and get the respective posts, merge them sorted in time order.
This was having a heavy read load as queries were huge for getting their timeline.

Then they introduced a cache for a user’s timeline and whenever a person tweets, then the post will be cached in all the persons who the user being followed by. Thus the reads get much faster now but write for user with huge amount of followers get slowed down.

Then they started to follow a hybrid of first and second methods and thus scaling to millions is possible now.

Takeaways from this example:

  • Here the load of the system is amount of reads and writes for a user timeline.
  • They rightly balanced it by introducing right components like cache and also altering the architecture as needed.

How can one define the performance?

Defining the load and load parameter is very important and thus also knowing what will happen if the load increases is also equally important.

We can ask ourselves like

  • What happens if the load parameter increases, how will the system behave?
  • What happens when we increase the load parameter and how much resources should be increased for keeping the performance the same?

Usually the performance could be measured by response time i.e. the time taken from client sending the request to receiving the response. Usually we confuse latency with response time.

*Latency and response time is not the same: *
Latency is the waiting time for the request to be handled, whereas response time is what the client sees including the waiting time, processing the request time, network delays and also queuing time.

Usually average response time is taken as a metric for knowing the performance but it doesn’t exactly tell us how many users has experienced the delay in response.

That’s where median comes into place, On getting the metrics for huge amount of requests and if we sort them in ascending order of response, then the mid point is the medium. Example if it is 100ms, then we can assure that 50% of requests were having response time lesser than 100ms and 50% having more. This 50% median can be taken as p50 or 50th percentile. Similarly p95, p99, p99.9 are some of the good percentiles which can be taken for getting the metrics.

Having such metrics is very important. For example, Amazon has found that a increase of 100ms in response time has slowed down 1% of their sales. Refer.

A simple test with Apache bench (ab) gives a good basic test performance of the system with all the needed percentiles, average and median of the response time. More vigorous tests could be done for further million dollar worth decisions anyway :D

Performance and Load has been defined. Now how to maintain good performance in respect to load increase? 💭

To be very honest, there is no magical way to make every system hyper speed and scale to infinity and back. It takes gradual development of architecture along the way of load increase. i.e., your architecture may work terrific with current load, but if the load gets 10 times bigger, it should be flexible enough to rethink and change/update the architecture.

One of the first thought we get at first is Scaling up and Scaling out.

Scaling up and Scaling out

One of the most basic ways to improve performance is by increasing the resource of the system which is serving the requests. This does obviously increases the performance. But tech has a limit and you can’t keep on adding more RAM and CPU on and on to scale more. It costs hell lot after a point and it is merely impossible after a point while maintaining such a system is typically classic and easy. This method is called scaling up.

Scaling out is adding more small powered machines which work together to solve a problem or to handle requests. This method does give operational hardening, but it will be able to scale more and more on demand.

Usually a combination of both will be used by different software for different use cases as needed. There is nothing called as perfect architecture ever. It all varies on the use cases.

An architecture that scales well is the one which knows it’s load parameters correctly and correct assumptions and decisions revolving around that load parameter and does operates good.

Everything is a trade-off. Trading ‘time’ for ‘space’ and vice versa. Trading ‘scalability’ with ‘easy to maintain’ and so on.

More on how many build and scald complex systems could be probably followed in the future posts.
*Final takeaways related to Reliability: *

  • Understanding the load is important.
  • Median is typically better parameter to consider than mean while considering response time.
  • Should be ready for heavy load and have clear plan on how to cope up with it.
  • Having a clear load parameter specific to the system is needed.
  • Scaling up and Scaling out has it’s own benefits and fall backs.
  • System should be flexible for future changes.

PS: Again Big thanks to Martin Kleppmann for the book “Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems book”.

Top comments (0)