DEV Community

tawseef nabi
tawseef nabi

Posted on

Clustering in Node.JS

An instance of Node.js runs in a single thread which means that on a multi-core system (which most computers are these days), not all cores will be utilized by the app. To take advantage of the other available cores, you can launch a cluster of Node.js processes and distribute the load between them.

Having multiple threads to handle requests improves the throughput (requests/second) of your server as several clients can be served concurrently. We'll see how to create child processes with the Node.js cluster module and then later, we'll take a look at how to manage clustering with the PM2 Process Manager.
With multiple processes, if one process is busy with a relatively CPU-intensive operation, other processes can take up the other requests coming in, utilizing the other CPUs/cores available. This is the power of the cluster module where workers share the load and the app does not come to a stop due to high load.

The master process can distribute the load to the child process in two ways. The first (and default) is a round-robin fashion. The second way is the master process listens to a socket and sends the work to interested workers. The workers then process the incoming requests.

Building a simple Express server without clustering:

We will start by creating a simple Express server. This server will do a relatively heavy computational task which will deliberately block the event loop. Our first example will be without any clustering.

To get Express set up in a new project we can run the following on the CLI:

mkdir nodejs-cluster-module
cd nodejs-cluster-module/
npm init -y
npm install --save express
Enter fullscreen mode Exit fullscreen mode

Then, we will create a file called no-cluster.js on the root of the project like below:

projetc-structure-no-cluster

The contents of the no-cluster.js file will be as follows:

const express = require("express");
const app = express();
const port = 3000;
console.log(`Worker ${process.pid} started`);
app.get("/", (req, res) => {
  res.send("Hello World!");
});

app.get("/api/:n", function (req, res) {
  console.time('no-cluster')
  let n = parseInt(req.params.n);
  let count = 0;

  if (n > 5000000000) n = 5000000000;

  for (let i = 0; i <= n; i++) {
    count += i;
  }
  console.timeEnd('no-cluster')
  console.log("Final count is ", count)
  res.send(`Final count is ${count}`);
});

app.listen(port, () => {
  console.log(`App listening on port ${port}`);
});
Enter fullscreen mode Exit fullscreen mode

the app contains two routes —

  • a root route that returns the string "Hello World"

  • another route that takes a route parameter n and adds numbers up to n to a variable count before returning a string containing the final count.
    The operation is an 0(n) operation so it offers us easy way to simulate long-running operations on the server — if we feed it a large enough value for n. We cap n off at 5,000,000,000 — let's spare our computer from having to run so many operations.

If you run the app with node app.js and pass it a decently small value for n (e.g. http://localhost:3000/api/50), it will execute quickly and return a response almost immediately. The root route (http://localhost:3000) also returns a response quickly.
we can see response time below

no-cluster-for-n-50

if we increase the value of n, we can see API response takes time. On increasing the value of n, the problem with single thread is clearly visible
e.g if n=5,000,000,000, the App will take few seconds to complete the response

no-cluster-for-n-50000000

As seen above, the API took 5.179s for n=50000000000 to finish as per our profiling added with console.time and console.timeEnd calls.

Adding Node.js clustering to an Express server

Now, let's use the cluster module in the app to spawn some child processes and see how that improves things.

const express = require("express");
const port = 3000;
const cluster = require("cluster");
const totalCPUs = require("os").cpus().length;

if (cluster.isMaster) {
  console.log(`Number of CPUs is ${totalCPUs}`);
  console.log(`Master ${process.pid} is running`);

  // Fork workers.
  for (let i = 0; i < totalCPUs; i++) {
    cluster.fork();
  }

  cluster.on("exit", (worker, code, signal) => {
    console.log(`worker ${worker.process.pid} died`);
    console.log("Let's fork another worker!");
    cluster.fork();
  });
} else {
  const app = express();
  console.log(`Worker ${process.pid} started`);

  app.get("/", (req, res) => {
    res.send("Hello World!");
  });

  app.get("/api/:n", function (req, res) {
    console.time("cluster")
    let n = parseInt(req.params.n);
    let count = 0;

    if (n > 5000000000) n = 5000000000;

    for (let i = 0; i <= n; i++) {
      count += i;
    }
    console.timeEnd("cluster")
    console.log("Final count is ", count)
    res.send(`Final count is ${count}`);
  });

  app.listen(port, () => {
    console.log(`App listening on port ${port}`);
  });
}
Enter fullscreen mode Exit fullscreen mode

The app does the same thing as before, but this time, we are spawning up several child processes that will all share port 3000 and that will be able to handle requests sent to this port. The worker processes are spawned using the child_process.fork() method. The method returns a ChildProcess object that has a built-in communication channel that allows messages to be passed back and forth between the child and its parent.
we get the number of CPUs available with require('os').cpus().length
If it is not the master process, it is the child process, and there we call the startExpress function. This function is the same as the Express server in the previous example without clustering.
We create as many child processes as there are CPU cores on the machine the app is running. It is recommended to not create more workers than there are logical cores on the computer as this can cause an overhead in terms of scheduling costs. This happens because the system will have to schedule all the created processes so that each gets a turn on the few cores.

The workers are created and managed by the master process. When the app first runs, we check to see if it's a master process with isMaster. This is determined by the process.env.NODE_UNIQUE_ID variable. If process.env.NODE_UNIQUE_ID is undefined, then isMaster will be true.

If the process is a master, we then call cluster.fork() to spawn several processes. We log the master and worker process IDs. Below, you can see the output from running the app on a four-core system. When a child process dies, we spawn a new one to keep utilizing the available CPU cores.

cluster

As we can see, all eight CPUs have eight relevant workers running ready to take up any requests coming in. If we hit http://localhost:3000/api/:n we will see the following output, identical to the output from the previous non-clustering server

Load testing servers with and without clustering

To load test our Node.js servers with and without clustering, we will use the loadtest tool. Other options can be the Vegeta load testing or the Apache benchmark tool as well.
The loadtest package allows you to simulate a large number of concurrent connections to your API so that you can measure its performance.
To use loadtest, first install it globally:

npm install -g loadtest
Enter fullscreen mode Exit fullscreen mode

Then run the app that you want to test with node app.js. We'll start by testing the version that doesn't use clustering.

With the app running, open another Terminal and run the following load test:

 loadtest http://localhost:3000/api/500000 -n 1000 -c 100
Enter fullscreen mode Exit fullscreen mode

The above command will send 1000 requests to the given URL, of which 100 are concurrent. The following is the output from running the above command:

non-cluster

We see that with the same request (with n = 500000) the server was able to handle 786 requests per second with a mean latency of 121 milliseconds (the average time it took to complete a single request).

Let's try it again but with more requests this time (and with no clusters):


loadtest http://localhost:3000/api/5000000 -n 1000 -c 100

non-cluster

With a request where n = 5000000 the server was able to handle 183 requests per second with a mean latency of 517.1 milliseconds.

Let's compare this result with that of the app that uses clusters.

Below are the results for testing for http://localhost:3000/api/500000:

cluster

Tested with the same requests (when n = 500000), the app that uses clustering was able to handle 1051 requests per second — a significant increase, compared to the 788 requests per second of the app with no clusters. The mean latency of the clustered app is 91.2 milliseconds, compared to 121 of the app with no clusters. You can clearly see the improvement that clustering added to the app.

We'll run two more tests for each of our apps. We'll test requests that aren't CPU-intensive and that run fairly quickly without overloading the Event Loop.

With the no-cluster app running, execute the following test:

loadtest http://localhost:3000/api/50 -n 1000 -c 100

non-cluster

With the same no-cluster app still running, execute the following test:


loadtest http://localhost:3000/api/5000 -n 1000 -c 100

Here are the summarized results:

non-cluster

With the cluster app running, execute the following test:

loadtest http://localhost:3000/api/50 -n 1000 -c 100

The summarized results:

cluster

The clustered app ran 1482 requests per second compared to 1481 of the no-cluster one and had a mean latency of 64.2 milliseconds compared to 64.3 of the no-cluster one.

Let's run the other test. With the same cluster app still running, execute the test below:

loadtest http://localhost:3000/api/5000 -n 1000 -c 100

The summarized results:

cluster

Here, the clustered app ran 1475 requests per second compared to 1465 of the no-cluster one and had a mean latency of 65.2 milliseconds compared to 64.6 of the no-cluster one.

Based on those tests, you can see that clustering didn't offer much improvement to the app's performance. In fact, the clustered app performed a bit worse compared to the one that doesn't use clusters. How come?

In the tests above, we call our API with a fairly small value for n, which means that the number of times the loop in our code will run is considerably small. The operation won't be that CPU-intensive. Clustering shines when it comes to CPU-intensive tasks. When your app is likely to run such tasks, then clustering will offer an advantage in terms of the number of such tasks it can run at a time.

However, if your app isn't running a lot of CPU-intensive tasks, then it might not be worth the overhead to spawn up so many workers. Remember, each process you create has its own memory and V8 instance. Because of the additional resource allocations, spawning a large number of child Node.js processes is not always recommended.

In our example, the clustered app performs a bit worse than the no-cluster app because we are paying the overhead for creating several child processes that don't offer much advantage. In a real-world situation, you can use this to determine which apps in your microservice architecture could benefit from clustering — run tests to check if the benefits for the extra complexity are worth it.

References

Top comments (0)