Damilola Olatunji for AppSignal

Posted on Dec 13, 2023 • Originally published at blog.appsignal.com

An Introduction to Profiling in Node.js

#node

CPU-bound tasks can grind your Node.js applications to a standstill, frustrating users and driving them away. You must master the art of profiling your application to pinpoint bottlenecks, and implement effective strategies to resolve any
issues.

In this guide, we'll explore various techniques to help you identify and fix the root causes of your Node.js performance issues.

Let's get started!

Prerequisites

Before proceeding with the article, ensure that you have the most recent LTS of Node.js and npm installed on your computer.

We'll use autocannon to run load tests on various server endpoints, so install it globally on your machine:

npm install -g autocannon

To follow along with the code samples, clone this repository and install the dependencies accordingly:

git clone https://github.com/damilolaolatunji/nodejs-profiling && cd nodejs-profiling
npm install

Node.js Architecture: A Quick Overview

Node.js operates on an event-driven architecture, handling multiple client requests concurrently without creating separate threads for each connection. It achieves this by being non-blocking, allowing I/O operations like file operations and database queries to be processed asynchronously. When events occur, registered callbacks are executed, ensuring efficient handling of various tasks.

However, Node.js isn't suitable for CPU-bound tasks like heavy calculations or image processing. These tasks rely on the speed of a computer's CPU, and Node.js, with its single-threaded event loop, can only execute one JavaScript code at a time. As a result, CPU-bound tasks can block the main thread, causing significant performance problems, as we'll explore further in the next section.

Demonstrating a CPU-bound Task in Node.js

To illustrate a CPU-bound task, let's use the classic example of calculating a Fibonacci number from a specific position in the Fibonacci sequence.

This sequence starts with 0 and 1, where each subsequent number is the sum of the two preceding ones, following the relation F(n) = F(n−1) + F(n−2) with seed values F(0) = 0, F(1) = 1. Here are the first 20 Fibonacci numbers:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181

Calculating the nth Fibonacci number using a recursive algorithm is straightforward:

// src/fibonacci.js
function calcFiboRecursive(n) {
  if (n <= 1) return n;
  return calcFiboRecursive(n - 1) + calcFiboRecursive(n - 2);
}

However, this algorithm's time complexity is O(2^n), indicating exponential growth for the input size (n) (meaning that as n increases, the algorithm becomes significantly slower).

Bear in mind that there are much more efficient algorithms for calculating the Fibonacci number. This example only serves to illustrate the problem with CPU-bound tasks in Node.js.

Testing the CPU-bound Task

In the src/server.js file, there's a /fibonacci-recursive/:n route for calculating the Fibonacci number at position n:

// src/server.js
. . .

fastify.get('/fibonacci-recursive/:n', (request, reply) => {
  const n = parseInt(request.params.n, 10);

  const result = calcFiboRecursive(n);

  reply.send({ result });
});

. . .

You can test it by starting the server on port 3000:

npm start

Once the server has started, send a request to find the 30th Fibonacci number. You can prepend the time command to find out approximately how long the request takes:

time curl http://localhost:3000/fibonacci-recursive/30

You should observe the result fairly quickly (on my laptop, it took 10 milliseconds):

{"result":832040}
Executed in   10.34 millis    fish           external
   usr time    1.22 millis  235.00 micros    0.99 millis
   sys time    2.03 millis   52.00 micros    1.98 millis

When the number starts to increase, even only slightly, the response time grows by many orders of magnitude. On my machine, finding the 40th and 45th Fibonacci numbers took 810ms and 9s, respectively.

Impact on the Event Loop

Aside from the fact that the results take a progressively long time to compute, each request also completely blocks the event loop, preventing any other task from executing until its completion.

To establish baseline behavior, use autocannon to send some load to the / route. Ensure that the server isn't processing any other requests before running the command below:

autocannon --renderStatusCodes http://localhost:3000

On my machine, I observed a throughput of almost 49k requests per second:

Running 10s test @ http://localhost:3000
10 connections

┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬───────┐
│ Stat    │ 2.5% │ 50%  │ 97.5% │ 99%  │ Avg     │ Stdev   │ Max   │
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼───────┤
│ Latency │ 0 ms │ 0 ms │ 0 ms  │ 0 ms │ 0.01 ms │ 0.06 ms │ 11 ms │
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴───────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬──────────┬─────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg      │ Stdev   │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼─────────┼─────────┤
│ Req/Sec   │ 41951   │ 41951   │ 49151   │ 50207   │ 48594.91 │ 2182.35 │ 41931   │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼─────────┼─────────┤
│ Bytes/Sec │ 8.09 MB │ 8.09 MB │ 9.49 MB │ 9.69 MB │ 9.38 MB  │ 422 kB  │ 8.09 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴──────────┴─────────┴─────────┘
┌──────┬────────┐
│ Code │ Count  │
├──────┼────────┤
│ 200  │ 534515 │
└──────┴────────┘

Req/Bytes counts sampled once per second.
# of samples: 10

535k requests in 10.01s, 103 MB read

Now, block the event loop by calculating the 50th Fibonacci number as follows. This request will take a long time to complete (about 123 seconds on my machine):

curl http://localhost:3000/fibonacci-recursive/50

While the result is computing (and the event loop is blocked), in a different terminal, use autocannon to send load to the / once again:

autocannon --renderStatusCodes http://localhost:3000

You will observe that the throughput drops dramatically to zero requests per second due to the ongoing Fibonacci computation. This clearly demonstrates how devastating blocking the event loop can be to Node.js performance.

. . .
┌──────┬───────┐
│ Code │ Count │
└──────┴───────┘

Req/Bytes counts sampled once per second.
# of samples: 10

20 requests in 10.02s, 0 B read
10 errors (10 timeouts)

Profiling your Node.js Application for Bottlenecks

When your application's performance starts to lag, pinpointing the exact cause of slowdowns can be a challenge. To effectively troubleshoot such issues, the first step is to profile your code and identify bottlenecks.

In this section, we'll explore three main approaches for profiling Node.js applications:

Using the built-in Node.js profiler
Using Chrome DevTools
Using the Inspector module for on-demand profiling in production

1. Using the Built-in Node.js Profiler

The first approach we'll consider requires no external tools or libraries. It involves using the built-in sample-based profiler built into Node.js, which is enabled through the --prof command-line option. Once the profiler is turned on, it records stacks of both JavaScript and C/C++ code into a file in the current working directory.

First, enable the profiler using the --prof command-line option when starting your Node.js server:

node --prof src/server.js

Once the server is running, simulate traffic to the /fibonacci-recursive/:n
route through autocannon as shown below:

autocannon --renderStatusCodes http://localhost:3000/fibonacci-recursive/30

On my machine, the server processed an average of 138.7 requests per second:

. . .

┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200  │ 1387  │
└──────┴───────┘

Req/Bytes counts sampled once per second.
# of samples: 10

1k requests in 10.01s, 261 kB read

The profiler records data into a file in your project directory. You will notice a new tick log file named isolate-0x<number>-v8.log. This file is not meant to be consumed by humans, but must be processed further to generate human-readable data.

To process this log data, use the built-in Node.js tick processor:

node --prof-process isolate-0x<number>-v8.log > processed.txt

This creates a new processed.txt file with human-readable profiling data in your project root. You can view its content by opening it up in your text editor. The file is broken up into several sections, each with its own table.

Let's delve into the processed data to identify bottlenecks, starting with the [Summary] section:

// processed.txt
 [Summary]:
   ticks  total  nonlib   name
   8603   84.0%   84.0%  JavaScript
    191    1.9%    1.9%  C++
     28    0.3%    0.3%  GC
      7    0.1%          Shared libraries
   1442   14.1%          Unaccounted

This summary indicates that approximately 84% of the samples occurred in JavaScript code, directing your attention to the JavaScript functions for investigation.

With this in mind, navigate to the [Javascript] section of the file to see which JavaScript functions are taking up the most time:

// processed.txt
 [JavaScript]:
   ticks  total  nonlib   name
   8542   83.4%   83.5%  JS: *calcFiboRecursive file:///home/ayo/dev/demo/nodejs-profiling/src/fibonacci.js:1:27
      9    0.1%    0.1%  JS: *resolve node:path:1095:10
      4    0.0%    0.0%  JS: *readPackageScope node:internal/modules/cjs/loader:377:26
      3    0.0%    0.0%  JS: *clearIncoming node:_http_server:948:23
      2    0.0%    0.0%  JS: ^defaultGenReqId /home/ayo/dev/demo/nodejs-profiling/node_modules/fastify/lib/reqIdGenFactory.js:23:28
. . .

In a real-world application, you might do a lot of processing in several functions to handle user requests. But here, the calcFiboRecursive() function is (predictably) where most time is spent, making it the clear target for optimization.

2. Using Chrome DevTools

A second approach to profiling your Node.js applications is to use Chrome DevTools to collect performance data. You need to start your server using the --inspect flag to get started:

node --inspect src/server.js

Once the server is running, open Google Chrome or any other Chromium-based browser, and type chrome://inspect in the address bar.

Find the inspect link for your Node.js script and click on it to bring up a new DevTools window for debugging your Node.js app. Head over to the Performance tab and click on the record button to start profiling your code.

Head back to your terminal, and run autocannon against the
/fibonacci-recursive/:n route once more:

autocannon --renderStatusCodes http://localhost:3000/fibonacci-recursive/30

Once the command finishes, return to your DevTools window and click the Stop button to finish the profiling. You will immediately observe a detailed performance profile of your program:

The Main section displays a flame chart representing main thread activity in your program. The horizontal axis represents time, while the vertical axis represents the call stack. You can zoom in to get a more detailed view:

In this view, the events on top cause the events below them, and the width of each event indicates its duration (the wider it is, the longer the event runs). The frames under parserOnHeadersComplete encompass routing logic, parameter processing, and, finally, lead to the request handler ((anonymous)). This handler calls the calcFiboRecursive() function multiple times due to its recursive nature, clearly visible in the graph.

Clicking the (anonymous) frame and switching to the Bottom-Up tab below lists the various resulting function calls and orders them according to their duration:

3. Using the Inspector Module for On-Demand Profiling in Production

The two profiling approaches above are useful ways to profile your code in a development environment. However, since performance issues are often first observed in production, it's crucial to set up on-demand profiling in a way that won't affect runtime performance.

The approach we'll discuss here combines the strengths of both methods, using built-in Node.js APIs to capture profiling data and then analyzing the data using Chrome DevTools.

The Inspector API provides the ability to interact with the V8 inspector programmatically. You can use it to collect profiling data from your application like this:

// src/inspector.js
import * as inspector from "node:inspector/promises";
import fs from "node:fs";

const session = new inspector.Session();

async function startCPUProfile() {
  try {
    session.connect();

    await session.post("Profiler.enable");
    await session.post("Profiler.start");
  } catch (err) {
    console.error(err);
  }
}

async function stopCPUProfile() {
  try {
    const { profile } = await session.post("Profiler.stop");
    fs.writeFileSync("./profile.cpuprofile", JSON.stringify(profile));
    session.disconnect();
  } catch (err) {
    console.error(err);
  }
}

process.on("SIGUSR1", startCPUProfile);
process.on("SIGUSR2", stopCPUProfile);

This file sets up an on-demand CPU profiling mechanism for the Node.js application through the following two functions:

startCPUProfile() connects to the inspector session and enables the CPU profiler by sending appropriate commands using the inspector protocol.
stopCPUProfile() stops the CPU profiler, retrieves the profiling data, and writes it to a file named profile.cpuprofile.

The program is also set up to listen for the SIGUSR1 and SIGUSR2 signals. The former triggers the collection of the CPU profile, while the latter stops CPU profiling and saves the profiling data to a file for analysis.

You can restart your server with npm start, then trigger CPU profiling by sending the SIGUSR1 signal to the process through the kill command. Ensure you replace <process_id> with the pid value found in the server startup log:

kill -SIGUSR1 <process_id>

If the above command exits successfully, it indicates that profiling has started. You can now use autocannon to send some load to the /fibonacci-recursive/:n route as before. Once that's done, send the SIGUSR2 signal to stop the profiling and write the data to a file:

kill -SIGUSR2 <process_id>

You should see a new profile.cpuprofile file in your project directory which contains the collected profiling information. To analyze the data, open Chrome DevTools and click the Load profile button in the Performance tab:

Once the file is loaded, you will see the flame graph as before. Analyze it to identify performance bottlenecks, just as described in the previous section.

With this approach, you can trigger CPU profiling on demand, in real-time, without interrupting server operations. In a production environment, consider setting up a different mechanism (e.g., an HTTP endpoint) to trigger profiling and automatically upload the profile data to a secure location for convenient retrieval and analysis.

Wrapping Up

In this article, we explored the challenges associated with managing CPU-bound tasks in Node.js and their substantial influence on runtime efficiency. You also learned several methods to profile your code and identify performance bottlenecks.

In the upcoming and concluding second part, our focus will shift to specific techniques you can employ to enhance the performance of CPU-bound tasks in Node.js.

Until next time, thanks for reading!

P.S. If you liked this post, subscribe to our JavaScript Sorcery list for a monthly deep dive into more magical JavaScript tips and tricks.

P.P.S. If you need an APM for your Node.js app, go and check out the AppSignal APM for Node.js.

DEV Community

An Introduction to Profiling in Node.js

Prerequisites

Node.js Architecture: A Quick Overview

Demonstrating a CPU-bound Task in Node.js

Testing the CPU-bound Task

Impact on the Event Loop

Profiling your Node.js Application for Bottlenecks

1. Using the Built-in Node.js Profiler

2. Using Chrome DevTools

3. Using the Inspector Module for On-Demand Profiling in Production

Wrapping Up

Top comments (0)

Read next

Code Against the Clock: How I Enhanced My Scrum Master Productivity

Reliable Notification Systems: Implementing Dead Letter Queues with RabbitMQ and Node.js

The Evolution of State Management in JavaScript

NestJS tip: how to run operations when the HTTP server is ready