Lukas Mauser

for Wimadev

Posted on Oct 30, 2023

📈 Scaling Web Applications to a Billion Users. It is Complicated... 😵‍💫

#webdev #devops #programming #vue

Did you ever think:

"Instagram is just a photo sharing website, I could build something like that!"

You can't.

You can probably build a photo sharing website, but the tricky part is scaling it to a billion users.

But why is that so complicated?

Let me explain...

Increasing Demands on Performance and Reliability

Running a small project is easy. Your code doesn't have to be perfect and it doesn't really matter to you if a request takes 200 ms or 500 ms. You probably would not even notice a difference.

But what if 10 people want to access your service at the same time? Let's assume all requests get handled one after the other. Waiting 10 x 200ms = 2 seconds vs 10 x 500ms = 5 seconds does make a noticeable difference. And that's only 10 people.

Now think of 100 people, 1,000 or 100,000 people that are constantly bombarding your servers with requests. Performance improvements of just a few milliseconds make a huge difference at scale.

And same goes for security and reliability. How many people will notice, if your tiny site is down for an hour? On a large site, there could be hundreds of thousands of people who will get upset, who can not finish their checkout process, who maybe rely on that service so much, that it freezes their entire business.

That's why large enterprises have uptime goals of 99.995%. That's a maximum downtime of 30 minutes throughout the whole year!

And that's when it get's complicated...

Scaling Infrastructure

When your hobby project has reached unbearable response times, usually the easiest thing to do is to migrate everything to a bigger server. This process is called scaling vertically or in other words: "throwing money at the problem".

But there is only so much traffic that single machine can handle. So at some point you'll have to add more servers, that run your application in parallel. This is called scaling horizontally.

And now it already get's complicated. Do you run multiple small machines, or a few big ones? Or a mixture of small and big machines (diagonal scaling)? How do you distribute incoming traffic on the cluster? How do you deal with certain traffic spikes?

If you can not confidently answer these questions, it's time to bring in a dev ops specialist. But it does not end here. Remember maximum downtime of 30 minutes per year? You don't reach that, if you do not plan for failures.

You need redundancy in your system, meaning, if one node fails, another one is ready to take over. Or go even further: distribute your computing resources across multiple regions and service providers to also minimize platform risk.

And it get's even more complex when you think about scaling your infrastructure globally. How do you ensure low latency for users in Brazil? What about Australia, Europe, Asia, ...? You get the point. Infrastructure of big global applications is complicated.

But infrastructure is not the only bottleneck when scaling your app.

Scaling the Codebase

In the last chapter I talked about running your app on multiple machines to handle big amounts of traffic. But is it even possible to run your code in parallel or do you use a database that needs to stay consistent across all machines? How do you split your application logic? What part of your code runs on what machine?

Scaling your application also means scaling your codebase.
And this includes:

distributing application logic,
introducing advanced monitoring tools,
optimizing code for security, performance and reliability,
improving performance through additional layers like CDNs or caching,
introducing quality control processes,
...

And all of that usually means, every tiny little thing, that was so easy to do in your hobby project, is now exponentially more complex.

Take logging for example:

In your photo sharing hobby project, you look at the log file on the server. How do you do that in a cluster of hundreds of servers? And how do you keep an overview on millions of logs a day?

And again, remember 30 minutes maximum downtime? How often do you accidentally push broken code that crashes your whole application? You do not want this to happen in a serious production environment. That's why scaling your codebase also means setting up processes to ensure no one accidentally breaks something.

The same bugfix, that is done within 15 minutes in a hobby project, can take several days if not weeks in a large scale application.

From reproducing the bug, prioritizing it, discussing solutions, coding the fix, writing tests, writing documentation, reviewing the code, reviewing security issues, verifying it works for the customer, load testing, iterating back and forward through the test pipeline to finally releasing it.

But wait, there is more...

Scaling an application doesn't end with scaling the core product. In a bigger context it also means scaling a company. Scaling the team, or multiple teams, multiple divisions, comply with legal requirements in different countries and so on.

You get the point. So next time you think about Instagram as an easy weekend project, also think about the underlying iceberg below the waterline.

But anyways, none of that should scare you from starting something. You won't reach that scale over night. Don't loose yourself in hypothetical scaling scenarios, instead go step by step as you need to.

Interesting read:
Instagrams grew to 14M users with only 3 engineers in the beginning, one of the engineers describes their early architecture: https://instagram-engineering.com/what-powers-instagram-hundreds-of-instances-dozens-of-technologies-adf2e22da2ad

Side note: Need a helping hand with developing your scalable Vue or Nuxt application? Contact me on https://nuxt.wimadev.de

Top comments (14)

Dusan Petkovic • Oct 30 '23

I doubt that anyone considers these things from the start of the project, its very hard to at least and you need lots of expertise, so initially when building something its fine to just focus on building it to be a working product, then in the next phase when the app starts getting more traction then you start considering how to scale.

But it for sure doesn't hurt to implement good practices from the start like version control and some form of CI..

But anyways great point raised in the article!

Lukas Mauser • Oct 30 '23

The point I am trying to make here is, that building large scale apps is complicated.
But of course you are right, in 99% of cases it's unnecessary overhead to kick off a project like this. Except if you are a huge company like Google or Meta...

Dusan Petkovic • Oct 31 '23

Yea, you need lots of resources and talent to do it, but very interesting and worth considering when starting a project, to at least make the least worst decisions

Eckehard • Oct 30 '23

Intresting insights, thank you very much!

How do you think different UI-frameworks have an effect, if applications start to grow? Talking of React, Svelte, Solid.js or even HTMX might give your very different load profiles.

Lukas Mauser • Oct 31 '23 • Edited

I think the choice of UI framework is not that important... UI usually won't be the bottleneck and modern frameworks are all very similar in terms of performance (take a look at these benchmarks if you are interested: krausest.github.io/js-framework-be... ). The code you put inside the framework probably matters more than the framework itself.
When it comes to choosing a UI framework I would much rather go with what your team is good at and what DX you prefer.

Dusan Petkovic • Oct 31 '23

For any front-end or UI framework, as the application grows and a company grows and new features are build I think the main bottleneck will be technical debt that needs to be addressed and dev processes that need to be updated to account for a larger and larger codebase and more teams that need to work together without breaking things...

Ian • Oct 31 '23

This, exactly.
Which is why it can be so frustrating when you see users say silly things like "build your own [insert website here] if you don't like [website to be replaced]!"

I have actually thought about the idea of what it would be like to create the next Twitter, Facebook, or Reddit. But the fact is that the people behind these companies have MASSIVE resources and can actually afford that kind of gambit.

You need servers, you need stable systems, you need people.
Now if you already have all of those things there might be a shot.

If you don't, you'd better have some solid investors.

Lukas Mauser • Oct 31 '23

Yeah, nowadays they truly have massive resources.
Still, I think no one should not be intimidated by that. You can reach significant scale with just a few talented engineers. IG now has 500 employees, but they grew to 14M users with only 3 engineers.

Here's an interesting read by their engineering team on the early stack instagram-engineering.com/what-pow...

Lukas Mauser • Oct 30 '23 • Edited

You are welcome 🤠
Just to make it clear: In a startup, I would not spend too much time with hypothetical scaling scenarios.
Scale when you hit the ceiling and do it step by step ;-)

Clayton Kehoe • Oct 31 '23

This was a fantastic read - thanks for sharing!

Lukas Mauser • Oct 31 '23

Thank you, glad you liked it!

Raul Ferreira • Oct 30 '23

I had never thought about it that way, I always thought I could do a big project, but looking at it from that side, thinking about the scale of the project was something I had never thought about, it was worth opening my mind to this factor 🦤.