Node.js is one of the most popular technologies nowadays to build scalable and efficent REST API's. It is also used to build hybrid mobile applications, desktop applications and even Internet of Things.
I have been working with Node.js for about 6 years and I really love it. This posts tries to be an ultime guide to understand how Node.js works.
Let's get started!!
Table of Contents
- The World Before Node.js
- The C10K Problem
- Node.js and the Event Loop
- The Problem with CPU Intensive tasks
- Worker Threads
The World Before Node.js
Multi Threaded Server
Web applications were written in a client/server model where the client would demand resources from the server and the server would respond with the resources. The server only responded when the client requested and would close the connection after each response.
This pattern is efficient because every request to the server takes time and resources (memory, CPU, etc). To attend the next request the server must complete the previous one.
So, the server attends one request at time? Well not exactly, when the server gets a new request, the request will be processed by a thread.
A thread in simple words is time and resources the CPU gives to execute a small unit of instructions. With that said, the server attends multiple requests at once, one per thread (also called thread-per-request model).
To attend N requests at once, the server needs N threads. If the server gets the N+1 request, then it must wait until any of those N threads is available.
In the Multi Threaded Server example, the server allows up to 4 requests (threads) at once and when it receives the next 3 requests, those requests must wait until any of those 4 threads is available.
A way to solve this limitation is add more resources (memory, CPU cores, etc) to the server but maybe it's not a good idea at all...
And of course, there will be technological limitations.
Blocking I/O
The number of threads in a server isn't the only problem here. Maybe you are wondering why a single thread can't attend 2 or more request at once? That's because blocking Input/Output operations.
Suppose you are developing an online store and it needs a page where the user can view all your products.
The user access to http://yourstore.com/products and the server renders an HTML file with all your products from database. Pretty simple right?
But, what happens behind?...
When the user access to /products a specific method or function needs to be executed to attend the request, so a little piece of code (maybe yours or framework's) parses the requested url and searches for the right method or function. The thread is working. ✔️
The method or function is executed, as well as the first lines. The thread is working. ✔️
Because you are a good developer, you save all system logs in a file and of course, to be sure the route is executing the right method/function you log a "Method X executing!!" string, that's a blocking I/O operation. The thread is waiting. ❌
The log is saved and the next lines are being executed. The thread is working again. ✔️
It's time to go to the database and get all products, a simple query such as
SELECT * FROM products
does the job but guess what? that's a blocking I/O operation. The thread is waiting. ❌You get an array or list of all products but to be sure you log them. The thread is waiting. ❌
With those products it's time to render a template but before render it you need to read it first. The thread is waiting. ❌
The template engine does it's job and the response is sent to the client. The thread is working again. ✔️
The thread is free, like a bird. 🕊️
How slow are I/O operations? Well, it depends.
Let's check the table below:
Operation | Number of CPU ticks |
---|---|
CPU Registers | 3 ticks |
L1 Cache | 8 ticks |
L2 Cache | 12 ticks |
RAM | 150 ticks |
Disk | 30,000,000 ticks |
Network | 250,000,000 ticks |
Disk and Network operations are too slow. How many queries or external API calls does your system make?
In resume, I/O operations make threads wait and waste resources.
The C10K Problem
The Problem
In the early 2000s, servers and client machines were slow. The problem was about concurrently handling 10,000 clients connections on a single server machine.
But why our traditional thread-per-request model can't solve the problem? Well, let's do some math.
The native thread implementations allocate about 1 MB of memory per thread, so 10k threads require 10GB of RAM just for the thread stack and remember we are in the early 2000s!!
Nowadays servers and client machines are better than that and almost any programming language and/or framework solves the problem. Actually, the problem has been updated to handle 10 million clients connections on a single server machine (also called C10M Problem).
Javascript to the rescue?
Spoiler alert 🚨🚨🚨!!
Node.js solves the C10K problem... but why?!
Javascript server-side wasn't new in the early 2000s, there were a few implementations ontop of the Java Virtual Machine like RingoJS and AppEngineJS, based on thread-per-request model.
But if that didn't solve the C10K problem then why Node.js did?! Well, it's because Javascript is single threaded.
Node.js and the Event Loop
Node.js
Node.js is a server-side platform built on Google Chrome's Javascript Engine (V8 Engine) which compiles Javascript code into Machine code.
Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient. It's not a Framework, it's not a Library, it's a runtime environment.
Let's write a quick example:
// Importing native http module
const http = require('http');
// Creating a server instance where every call
// the message 'Hello World' is responded to the client
const server = http.createServer(function(request, response) {
response.write('Hello World');
response.end();
});
// Listening port 8080
server.listen(8080);
Non-blocking I/O
Node.js is non-blocking I/O, which means:
- The main thread won't be blocked in I/O operations.
- The server will keep attending requests.
- We will be working with asynchronous code.
Let's write an example, in every /home
request the server sends a HTML page, otherwise the server sends 'Hello World' text. To send the HTML page is necessary to read the file first.
home.html
<html>
<body>
<h1>This is home page</h1>
</body>
</html>
index.js
const http = require('http');
const fs = require('fs');
const server = http.createServer(function(request, response) {
if (request.url === '/home') {
fs.readFile(`${ __dirname }/home.html`, function (err, content) {
if (!err) {
response.setHeader('Content-Type', 'text/html');
response.write(content);
} else {
response.statusCode = 500;
response.write('An error has ocurred');
}
response.end();
});
} else {
response.write('Hello World');
response.end();
}
});
server.listen(8080);
If the requested url is /home
then using fs
native module we read the home.html
file.
The functions passed to http.createServer
and fs.readFile
are called callbacks. Those functions will execute sometime in the future (the first one when the server gets a request and the second one when the file has been read and the content is buffered).
While reading the file Node.js can still attend requests, even to read the file again, all at once in a single thread... but how?!
The Event Loop
The Event Loop is the magic behind Node.js. In short terms, the Event Loop is literally an infinite loop and is the only thread available.
Libuv is a C library which implements this pattern and it's part of the Node.js core modules. You can read more about libuv here.
The Event Loop has six phases, the execution of all phases is called a tick.
-
timers: this phase executes callbacks scheduled by
setTimeout()
andsetInterval()
. -
pending callbacks: executes almost all callbacks with the exception of close callbacks, the ones scheduled by timers, and
setImmediate()
. - idle, prepare: only used internally.
- poll: retrieve new I/O events; node will block here when appropriate.
-
check:
setImmediate()
callbacks are invoked here.close callbacks: such assocket.on(‘close’)
.
Okay, so there is only one thread and that thread is the Event Loop, but then who executes the I/O operations?
Pay attention 📢📢📢!!!
When the Event Loop needs to execute an I/O operation it uses an OS thread from a pool (through libuv library) and when the job is done, the callback is queued to be executed in pending callbacks phase.
Isn't that awesome?
The Problem with CPU Intensive Tasks
Node.js seems to be perfect, you can build whatever you want.
Let's build an API to calculate prime numbers.
A prime number is a whole number greater than 1 whose only factors are 1 and itself.
Given a number N, the API must calculate and return the first N prime numbers in a list (or array).
primes.js
function isPrime(n) {
for(let i = 2, s = Math.sqrt(n); i <= s; i++)
if(n % i === 0) return false;
return n > 1;
}
function nthPrime(n) {
let counter = n;
let iterator = 2;
let result = [];
while(counter > 0) {
isPrime(iterator) && result.push(iterator) && counter--;
iterator++;
}
return result;
}
module.exports = { isPrime, nthPrime };
index.js
const http = require('http');
const url = require('url');
const primes = require('./primes');
const server = http.createServer(function (request, response) {
const { pathname, query } = url.parse(request.url, true);
if (pathname === '/primes') {
const result = primes.nthPrime(query.n || 0);
response.setHeader('Content-Type', 'application/json');
response.write(JSON.stringify(result));
response.end();
} else {
response.statusCode = 404;
response.write('Not Found');
response.end();
}
});
server.listen(8080);
prime.js
is the prime numbers implementation, isPrime
checks if given a number N, that number is prime and nthPrime
gets the nth prime (of course).
index.js
creates a server and uses the library in every call to /primes
. The N number is passed through query string.
To get the first 20 prime numbers we make a request to http://localhost:8080/primes?n=20
.
Suppose there are 3 clients trying to access this amazing non-blocking API:
- The first one requests every second the first 5 prime numbers.
- The second one requests every second the first 1,000 prime numbers.
- The third one requests once the first 10,000,000,000 prime numbers, but...
When the third client sends the request the main thread gets blocked and that's because the prime numbers library is CPU intensive. The main thread is busy executing the intensive code and won't be able to do anything else.
But what about libuv? If you remember this library helped Node.js to do I/O operations with OS threads to avoid blocking the main thread and you are right, that's the solution to our problem but to use libuv our library must be written in C++ language.
Thanksfully Node.js v10.5 introduced the Worker Threads.
Worker Threads
As the documentation says:
Workers are useful for performing CPU-intensive JavaScript operations; do not use them for I/O, since Node.js’s built-in mechanisms for performing operations asynchronously already treat it more efficiently than Worker threads can.
Fixing the code
It's time to fix our initial code:
primes-workerthreads.js
const { workerData, parentPort } = require('worker_threads');
function isPrime(n) {
for(let i = 2, s = Math.sqrt(n); i <= s; i++)
if(n % i === 0) return false;
return n > 1;
}
function nthPrime(n) {
let counter = n;
let iterator = 2;
let result = [];
while(counter > 0) {
isPrime(iterator) && result.push(iterator) && counter--;
iterator++;
}
return result;
}
parentPort.postMessage(nthPrime(workerData.n));
index-workerthreads.js
const http = require('http');
const url = require('url');
const { Worker } = require('worker_threads');
const server = http.createServer(function (request, response) {
const { pathname, query } = url.parse(request.url, true);
if (pathname === '/primes') {
const worker = new Worker('./primes-workerthreads.js', { workerData: { n: query.n || 0 } });
worker.on('error', function () {
response.statusCode = 500;
response.write('Oops there was an error...');
response.end();
});
let result;
worker.on('message', function (message) {
result = message;
});
worker.on('exit', function () {
response.setHeader('Content-Type', 'application/json');
response.write(JSON.stringify(result));
response.end();
});
} else {
response.statusCode = 404;
response.write('Not Found');
response.end();
}
});
server.listen(8080);
index-workerthreads.js
in every call creates a new instance of Worker
class (from worker_threads
native module) to load and execute the primes-workerthreads.js
file in a worker thread. When the prime numbers' list is calculated the message
event is fired, sending the result to the main thread and because the job is done the exit
event is also fired, letting the main thread send the data to the client.
primes-workerthreads.js
changes a little bit. It imports workerData
(parameters passed from main thread) and parentPort
which is the way we send messages to the main thread.
Now let's do the 3 clients example again to see what happens:
The main thread doesn't block anymore 🎉🎉🎉🎉🎉!!!!!
It worked like expected but spawning worker threads like that isn't the best practice, it isn't cheap to create a new thread. Be sure to create a pool of threads before.
Conclusion
Node.js is a powerful technology, worth to learn.
My recommendation is always be curious, if you know how things work, you will make better decisions.
That's all for now, folks. I hope you learned something new about Node.js.
Thanks for reading and see you in the next post ❤️.
Top comments (37)
Thanks for this clear article on how the things are done in background using nodejs....I have some question : what is the difference between worker threads and os threads....in prime number example why os threads are not used by default by libuv when doing CPU heavy task...what is the limit for the os threads...
As said in the article, your code must be written in C++ in order to be able to use the OS threads pool.
Hence the introduction of worker threads which can be used directly in JS.
Yeah, he is right :D.
That's why DB drivers are written in C++.
Thanks for the great post!
Let's say I need a worker to do heavy CPU calculations. How does the worker works under the hood? Does it start on a separate CPU core and uses this core by 100%?
What if a CPU has only one core? Will Node worker help with this case or it'll be useless to start a worker, because there are no free resources to split between the main thread and worker thread?
Thanks for commenting.
Node.js by default works in a single CPU core, so worker thread will spawn and execute in the same CPU core that the main thread does.
If you want to deploy a Node.js application across all CPU cores you need to write some code using
cluster
native module. Thanksfully there is a library called PM2 which does the dirty work for you and deploys a Node.js application in all CPU cores with its built in load balancer.The worker thread real problem is spawn like crazy, since creating a subthread (worker thread) isn't cheap (talking about CPU time and resources) but it's cheaper than fork the same process.
Well, if a separate worker and main thread are executed in one CPU core, then they share the same resource.
Is it correct that if we write heavy CPU code in "chunks", i.e. we will return execution context from a heavy function to the function which handles http requests with some interval, then we will emulate workers? But in this case it's not necessary for Node to allocate resources for a subthread, i.e. it's cheaper in terms of performance.
Remember Worker as OS thread actually so I don't think they will be emulated in a single CPU core case. Anyway that's an interesting question but unfortunately workers are pretty "new" so there isn't enough information about them.
@jorge_rockr amazing article, i learned a lot about nodejs.
Question: I am building a mobile-responsive website, and my stack has Nuxt.js (vue.js framework) for the front end, talking to back-end APIs in Laravel (php framework). Nuxt.js is built on top of node (I think?) and I am using it for server-side-rendering. What are your thoughts on this architecture, versus using something purely in node.js?
Hi! Thanks you so much for reading my article.
Yeah, Nuxt.js and Next.js (React) are built on top of Node.js and it's cool since you can use your Frontend knowledge adding some Backend code.
In fact, using pure Node.js could have a better performance but it could be more complex and hard to maintain.
@jorge_rockr it was a phenomenal article, thank you for writing it.
Thanks for the recommendation. Sounds like I should stick with back-end APIs in Laravel, and use Nuxt for my front end. Hopefully I should see a lot of the user-benefits one can get from using Node with server-side-rendering for SEO.
Node.js is also a variety of ready-to-use packages written for the Node.js, thanks to which we can easily connect to almost any service or database. The most common Node itechcraft.com/node-js/ integration dialect, however, is JSON, which speaks well with the NoSQL databases.
Excellent article!! Great Job! Really helpful for those who are new to NODE.JS .I cleared my all doubts regarding Node.js. Get more information about Node.js here.
Jorge Ramón, thank you for this awesome article!)
The company I am working at, in January-February 2020 starts the open-source project for Node.js developers (microservices)!
Warm welcome🥳
Spectrum: spectrum.chat/yap?tab=posts (community chat)
GitBook: manual.youngapp.co/community-edition/ (docs)
Twitter: twitter.com/youngapp_pf (news)
GitHub: github.com/youngapp/yap (docs)
(click🌟star to support us and stay connected🙌)
Thank you for sharing this awesome post! Node.js rocks🤟
My team just completed an open-sourced Content Moderation Service built Node.js, TensorFlowJS, and ReactJS that we have been working over the past weeks. We have now released the first part of a series of three tutorials - How to create an NSFW Image Classification REST API and we would love to hear your feedback. Any comments & suggestions are more than welcome. Thanks in advance! 😊
Wow! Amazing Article!
Thanks for sharing ❤️
Thanks for the great post!
Can I translate this article into Chinese so that more people can see it ?
Sure!!
Amazing! I'm surprised about how much you know about the internal process of Node.js, Can you suggest me some resources (aside of documentation) to learn advanced concepts?
Sorry for late response.
Take a look at this resources:
And check out this video to understand how Event Loop works (in browser but it's almost the same in Node.js).
Thanks Jorge, i'm checking them out right now!
Some comments may only be visible to logged-in visitors. Sign in to view all comments.