One of the holy grails of the modern application is to build a scalable application. It is not one of the topics of the moment; it is THE hot topic.
For example, the cloud. The main selling point is to build a scalable application.
And about the programming languages, practically every language is sell as scalable as if it is a built-in feature.
However, it is a lie.
In general, the election of the language has little or not related to the scalability.
What it must be scalable is the information, not the system
For example NodeJS. NodeJS is selling as scalable. Of course, it is scalable because it is not touching or considering the information.
First, if we want to achieve the level of Google, then we must thing completely differently. However, it is not our case and it's pointless to try to mimic the model of Google (unless we want to own thousand of datacenters).
- Improving the hardware. But it has some limits. For example, let's say a 64gb server costs $3000, a 128gb server costs $10k, and a 256gb server costs $30k (so the costs increases considerably).
Q: But but, we are not talking about costs.
A: It's all about costs!.
Cache. Cache could increase the performance tenfold times while keeping the costs at bay.
Cloud. The cloud is scalable. Well, not really. Most "cloud" services are hosting and they are capped at some limit. If we want to exceed this cap, we need to spend money (a lot of money) on a better cloud machine. In fact, the cloud starts slow and expensive. We could achieve more with a VPS or a Dedicated Server and we could go big with more servers.
Q: But cloud is server-less.
A: Not, it is not (unless it is a true cloud).
Usually, the Web Server has a high output, it uses a few resources and practically it works as a channel between the UI and the web server. However, the real bottleneck is the database.
In this case, the Web Server allows up to 20 concurrent users (per second), but the database allows up to 3. So, the maximum concurrency is 3.
Multiple Web server. It changes nothing (but increases the costs of the server x 3). Why?. It is because the database is still the bottleneck.
We could gain something by reducing the bandwidth, for example using CDN but commonly, this model is not practical but a false illusion of scalability.
In this case, we achieve a concurrency of 3 calls per second or maybe we could earn a bit more (for bandwidth), maybe 4 calls per second.
The concept of Microservice is to have a small server that does a single task. For example, we have a server that connects to a database that owns a single table: TABLE1.
But what if we need to access all 3 tables at the same time?.
select * from table1 inner join table2 inner join table 3
- Customer 1 connects to the Microservice 1
- It connects to the local database and read the table 1
- It also connects to the Microservice 2 (and it reads the table 2)
- It also connects to the Microservice 3 (and it also reads the table 3)
- Merge all the information
In this case, we won nothing with Microservice. In fact, it could be slower than a single model because we need to coordinate the information. Every (relational) database does a superb job joining tables.
Microservice works to distribute the design and implementation of the project but for scalability.
In this case, we achieve a concurrency of 3 calls per second (for the worst case).
But, this model has the highest cost. It requires 3 web server and 3 databases, we also require a connection between each server.
Note: Some companies have big pockets to spend on servers but in developers. Still, each server requires maintenance. If we use the cloud, serverless, microservice, docker and such, we still need to maintain the system and it costs per machine, so if we have 6 machines, then maintenance cost is x6. It's not rare to think that cloud is maintenance free. Funny.
This case is truly scalable. We determine the limit of the database is 3 calls per second and the Web Server is 20 calls per second. Then, we add a LOAD BALANCE NODE, and we add 2 databases.
The trick is to replicate each database. There are many ways to replicate the database. It is not trivial, but it is possible.
In this case, we achieve a concurrency of 6 calls per second (a bit less if we consider that replication has a cost, so it is around 5.5 calls per second).
Costs: 1 web server, 1 load balancer (it could be the same web server), and 2 databases.
- But what if we need 20 concurrencies?. Simple, we add more databases.
- And what if we need more than 100 concurrencies?. We add a load balance for the web servers and for the databases.
Scalability has little or it's not even related to a specific language. One of the true scalable language/frameworks is Java Enterprise (Java EE), it has some built-in scalable features since a decade ago (clustering, shared sessions, shared database connection), features that other "scalable-ready-languages" don't have, but it is optional and we could build a scalable system even using Visual Basic.
Some languages could beat benchmarks by using asynchronous processes but those benchmarks are not real if they don't test the whole pipeline, including the access to the database.
Now, how to build a scalable system?: We need to build a scalable database and model, not to use a specific language, after all, it has never been the bottleneck.