DEV Community

loading...
Cover image for Improving Nodejs Performance Through Clustering (part 1)

Improving Nodejs Performance Through Clustering (part 1)

cpuram1 profile image CALVIN JOB PURAM Updated on ・7 min read

Nodejs applications are single-threaded by default which means that they don't take advantage of the multi-core system. Building up from the article I wrote about event loop here, whenever our event loop is performing some heavy task we often face performance issues. In this article, I will look at two different strategies to at least mitigate the performance impact that this has on our application and how to setup node to run inside of cluster mode and also using worker threads (these worker threads are going to use the thread pool that is set up by libuv whenever we start up our application).

Before I dive deep into these two topics I will set up an express application which I will use to demonstrate both of the techniques I mentioned above. First, I created a directory called "app" a file called "server.js"

   mkdir app
   touch server.js
Enter fullscreen mode Exit fullscreen mode

open this file in your favorite editor. Since we will be using express we need to create a package.json file

  npm init -y
  npm install express
Enter fullscreen mode Exit fullscreen mode

Next, I created a simple express server listening on port 3000 and a single route

   const express = require('express');

   const app = express();

   app.get('/', (req, res) => {
     res.send('Hello Name');
   })

   app.listen(3000);
Enter fullscreen mode Exit fullscreen mode

Using node server.js to run the express app in the terminal and see the output in the browser.

Scalability Problem With Nodejs Application

Whenever an incoming request comes into our server, it gets executed in a single thread that contains our event loop this works out just fine for us however, we start to run into big issues when this incoming request takes some amount of time to be processed.

A simple Nodejs Application

If we have an incoming request that takes some amount of time to be executed then our node server will not be able to effectively process other requests as otherwise would. Let us look at an example.

I will create a function in the express app we created and the sole purpose of this function is to use as much CPU processing power as it possibly can for some set duration.

   const express = require('express');
   const app = express();

   const createWork = duration => {
   const start = Date.now();
   while(Date.now() - start < duration) {}
   }

   app.get('/', (req, res) => {
    createWork(5000);
     res.send('Hello Name');
   })

   app.listen(3000);
Enter fullscreen mode Exit fullscreen mode

Right above res.send() I call the createWork() function with a duration of 5s. Note that this function createWork() is going to run inside the event loop and for the duration of 5s while the code is running, our event loop can do absolutely nothing else which means our event loop can't handle other requests, database queries, writing files, and so on. So let us see that in action. Make sure you restart your app and if you visit 'localhost:3000' you will notice that it took 5s for this page to load this is because we are blocking the entire event loop with the createWork() function call and if you try to open multiple tabs in your browser and visit 'localhost:3000' on one tab and switching quickly to the other tab to visit the same 'localhost:3000' you will notice that it even took longer than 5s for the second tab to load this is because the event loop is busy with the first request. As you can see, as soon as we begin to write some javascript code that takes some amount of time to execute, our entire server is blocking any incoming request until the first request gets executed. With this in mind, let's see how clustering can mitigate this problem.

Demystifying Clustering In Nodejs

Cluster mode is used to start up multiple nodejs processes thereby having multiple instances of the event loop. When we start using cluster in a nodejs app behind the scene multiple nodejs processes are created but there is also a parent process called the cluster manager which is responsible for monitoring the health of the individual instances of our application.

Alt Text

The cluster manager is not responsible for handling incoming requests or fetching data from the database instead it is responsible for monitoring the health of each of the individual instances. So the cluster manager can start instances, stop them, restart them and send data to these instances while these instances process requests like accessing the database, handling authentication, serving static files, and so on.

Alt Text

By default when we type node and the name of the file we want to run in the terminal, Nodejs takes the content of the file, execute it and then start up the event loop so we get a single instance this is usually a linear process, But when we start using clustering this linear process changes

Alt Text

This time using clustering when we run node server.js, Nodejs will still execute our file and lunch a node instance but in this case, the first instance of node that gets lunch is the cluster manager the cluster manager is then responsible for starting worker instances which are responsible for processing incoming request. To create these worker instances, we are going to require the cluster module from the Nodejs standard modules const cluster = require('cluster') and using a method that this module exposes to use cluster.fork() we can create instances of worker threads.

When we call cluster.fork() Nodejs goes back to our server.js and execute the second time in a slightly different mode which starts up the worker instance so in other words our server.js file will get executed multiple times by Nodejs the first time it will produce the cluster manager and every time after that we have a worker instance. I know this seems so complex trying to explain with words and diagrams let's look at some examples.

First I will require cluster from the Nodejs standard module

   const cluster = require('cluster');
   const express = require('express');

   const app = express();

   const createWork = duration => {
   const start = Date.now();
   while(Date.now()- start < duration) {}
   }

   app.get('/', (req, res) => {
    createWork(5000);
     res.send('Hello Name');
   })

   app.listen(3000);
Enter fullscreen mode Exit fullscreen mode

The cluster module has some important properties that we are going to test. first in our server.js file let us test the isMaster property

   const express = require('express');
   const app = express();

   console.log(cluster.isMaster);

   const createWork = duration => {
   const start = Date.now();
   while(Date.now()- start < duration) {}
   }

   app.get('/', (req, res) => {
    createWork(5000);
     res.send('Hello Name');
   })

   app.listen(3000);
Enter fullscreen mode Exit fullscreen mode

This will log true to the console remember what I said before that when we first execute our Nodejs program the content of that file is executed by Nodejs which also starts up a copy of node instance called the cluster manager. The cluster manager has the isMaster property set to true and as soon as we start forking worker instances that isManager property will be set to false for the worker instances.

So we first check to see if the cluster is master else then is a child

   const express = require('express');
   const app = express();

   if(cluster.isMaster) {
      cluster.fork()
    } else {
      const createWork = duration => {
      const start = Date.now();
      while(Date.now()- start < duration) {}
      }

     app.get('/', (req, res) => {
       createWork(5000);
       res.send('Hello Name');
     })

     app.listen(3000);
   }

Enter fullscreen mode Exit fullscreen mode

This is what happens when we execute this file we are going to ask the question

  • is the file being executed in the master mode? and if it is then we can decide to create a new instance using cluster.fork().
  • but if the file being executed is not in the master mode then definitely that's a child and will act as a server so the child instances behave like a normal express server.

So we have one child instance which is not doing a lot for us because we only have one event loop no any performance benefit and if we want to improve the performance of our application, we can increase this child instance multiple times by calling cluster.fork().

   const express = require('express');
   const app = express();

   if(cluster.isMaster) {
      cluster.fork()
      cluster.fork()
      cluster.fork()
      cluster.fork()
    } else {
      const createWork = duration => {
      const start = Date.now();
      while(Date.now()- start < duration) {}
      }

     app.get('/', (req, res) => {
       createWork(5000);
       res.send('Hello Name');
     })

     app.get('/fast', (req, res) => {
      res.send('this is fast');
     })

     app.listen(3000);
   }

Enter fullscreen mode Exit fullscreen mode

Here I created four worker instances and a new route /fast to test the performance impact of these worker instances on our little application. Remember previously we try to run the get('/') route with the createWork(5000) function in the browser and we try to open two tabs to run localhost:3000 almost at the same time and it turns out that the first tab took approximately 5s while the second took approximately 8s and this is because the event loop is busy handling the first request so this second request has to wait. Now with these four instances, we will try to run both get('/') route and get('/fast') almost at the same time in a separate browser tab and it turns out that the get('/fast') route load fast as expected but the get('/') route load approximately 5s this is because we have two separate servers handling each request. Let us test these routes using only one worker instance (remove the three cluster.fork())and using the same procedure above let's try to load the get('/') route first and immediately reload the get('/fast') route you will notice that it took the get('/fast') route more time to load because the event loop is busy handling the first request which took (5s) to complete and during that time our server can do absolutely nothing even though we have an incoming request.

This is a practical example where clustering can play a huge benefit inside your application so if you have many routes in your application and some of these routes do take time to complete you can take advantage of clustering to start up multiple instances o your server that can address all incoming request and have some predictable response time.

Discussion (0)

Forem Open with the Forem app