Muly Gottlieb for Amplication

Posted on Jun 6, 2023 • Originally published at amplication.com

Using Parallel Processing in Node.js and its Limitations

#node #backend #programming #webdev

As all modern developers know, Node.js is a programming language built to develop applications that efficiently handle I/O-intensive operations. Most applications that require near real-time communication have been engineered using Node.js due to its non-blocking, event-driven, and single-threaded nature.

But why would you need parallel processing?

Well, the single-threaded nature ultimately leads to its biggest downfall. Node.js utilizes a synchronous event loop engineered using Libuv that takes in code from the call stack and executes it.

Figure: The event loop

The event loop (as shown above) delegates all I/O-related operations to a set of threads that perform these operations in parallel. Then, the event loop continues to execute all non-I/O operations synchronously. This process means that while the event loop executes a CPU-intensive process, applications are blocked from further execution.

Thus, if you're building a CPU-intensive application, you will run into some performance issues.

Bringing worker threads to the rescue

Though Node.js can execute I/O operations efficiently, it still offers an approach to adopt parallel processing for CPU-demanding tasks through "Worker Threads."

Node.js introduced "Worker Threads" in v10.5 with the sole purpose of offloading CPU-intensive operations from the event loop so developers can execute threads in parallel in a non-blocking manner.

If you're new to worker threads, think of it as an isolated Node.js context that contains a complete Node.js runtime that consists of its event loop and queue that runs in a remote V8 environment.

Worker threads can then execute CPU-intensive operations in their execution environment and communicate the status and outputs to the parent thread using a messaging channel. The parent thread continues to perform its functions as usual (without being blocked).

This model frees up the primary event loop and ensures that your application remains functional and usable while it performs CPU intensive tasks. You can also utilize a pool of worker threads that could be responsible for dividing and executing heavy CPU-intensive operations in parallel which can significantly improve the performance of your Node.js application.

Seeing worker threads in action

Integrating worker threads into your Node.js applications is relatively straightforward. Let's see how we can adopt a worker thread into a real-world production application.

The scenario at hand

Imagine you have to compute all the prime numbers that fall below a particular range. For example, you have to print all prime numbers below 100,000,000. This task is a CPU-intensive operation and will cause the primary thread to block if executed. For such use cases, you can use worker threads to offload the computation to a separate process while keeping the primary thread block-free.

The pre-requisites

First, you must install the latest stable version of Node.js to ensure worker threads can work on your Node.js application. You can install Node.js via a package manager such as Homebrew (Mac) or Chocolatey (Windows) with the below command.

brew install node

choco install nodejs

Building the worker thread

Next, you can create a new project using npm init and create a file - prime.js to include the codebase needed to generate primes over a given range.

function generatePrimes(start, end) {
  const primes = [];

  // Function to check if a number is primefunction
  isPrime(num) {
    if (num < 2) return false;
    for (let i = 2; i <= Math.sqrt(num); i++) {
      if (num % i === 0) {
        return false;
      }
    }
    return true;
  }

  // Generate primes within the range
  for (let i = start; i <= end; i++) {
    if (isPrime(i)) {
      primes.push(i);
    }
  }
  return primes;
}

module.exports = {
  generatePrimes,
};

The snippet above helps generates the prime numbers over a given range. This function has a time complexity of O(n*sqrt(n)), meaning that it is more than linear time. The time to generate the prime number list grows superlinearly as the range increases. Hence, you must utilize a worker thread to perform this heavy communication to help keep the primary thread block-free.

To build the worker thread, you must use the pre-built worker_threads library. Then, head to your entry file index.js and include the snippet below.

const {
  Worker,
  isMainThread,
  parentPort,
  workerData,
} = require("worker_threads");

const { generatePrimes } = require("./prime");

const threads = new Set();
const number = 999999;

const breakIntoParts = (number, threadCount = 1) => {
  const parts = [];
  const chunkSize = Math.ceil(number / threadCount);

  for (let i = 0; i < number; i += chunkSize) {
    const end = Math.min(i + chunkSize, number);
    parts.push({ start: i, end });
  }

  return parts;
};

if (isMainThread) {
  const parts = breakIntoParts(number, 5);
  parts.forEach((part) => {
    threads.add(
      new Worker(__filename, {
        workerData: {
          start: part.start,
          end: part.end,
        },
      })
    );
  });

  threads.forEach((thread) => {
    thread.on("error", (err) => {
      throw err;
    });
    thread.on("exit", () => {
      threads.delete(thread);
      console.log(`Thread exiting, ${threads.size} running...`);
    });
    thread.on("message", (msg) => {
      console.log(msg);
    });
  });
} else {
  const primes = generatePrimes(workerData.start, workerData.end);
  parentPort.postMessage(
    `Primes from - ${workerData.start} to ${workerData.end}: ${primes}`
  );
}

The snippet above utilizes the worker_threads library to achieve parallelism in a Node.js environment. The function breakIntoParts accepts a number (the number to check) and the expected thread count (worker count) and will chunk the number into ranges.

These ranges are passed into a thread where each thread will compute the prime numbers available in its given range and post a message to the parent after identifying the prime numbers in the given range.

You will notice that this ensures the single operation is chunked into several sub-processes (that each worker thread executes), thus achieving parallelism. For example, once you run the following code using - node index.js, you will notice the process has a faster computational time than on a single thread.

Observe the GIF attached below to witness the app's performance using multiple worker threads that try to compute the prime numbers between 0 and 9999999.

Figure: Computing prime numbers from 0 to 9999999

This sounds too good to be true

Like anything we use in software development, worker threads have limitations and drawbacks that can significantly impact your application if you have not adopted it correctly.

The cost of threading

When using worker threads in Node.js, it is crucial to understand that each worker thread operates with its dedicated instance of the V8 JavaScript engine. And this comes with a cost, as creating worker threads requires significant resource allocation. Consequently, it is best to utilize worker threads for CPU-intensive tasks that can take advantage of parallel processing rather than lightweight operations that may not justify the overhead.

One practical approach to optimizing worker threads' efficiency is to implement a thread pooling mechanism. The need for frequent creation and destruction of threads is mitigated by reusing a pool of worker threads. This strategy helps minimize the cost of spawning new threads and reduces resource consumption.

Maintainability

As shown above, in the index.js file, the code needed to maintain a set of worker threads can often get bulky. It might result in spaghetti code that can significantly impact your code readability and overall maintainability.

No access to the DOM

Worker threads run in isolation in a separate thread. Therefore, you cannot access the DOM, making this a poor solution if you're trying to update a UI component through a worker thread. Instead, you will have to use the postMessage API to communicate the response to the primary thread, which can update the DOM based on the response.

Increased complexity

Managing communication, synchronization, and coordination between threads requires careful design and implementation, which can often take time and even introduce bugs in the future. Additionally, debugging multi-threaded applications has become more challenging. Hence this significantly decreases the troubleshooting capability of the application.

Restricted APIs

Unlike the primary thread, worker threads can only access a restricted set of APIs. DOM, window, and UI-related APIs are not available in worker threads. You can only utilize some JavaScript features and APIs in worker threads.

Are worker threads the future?

You might wonder - "Are worker threads the future?"

Well, the answer is yes. Worker threads have shined some light into parallel processing in Node.js regarding non-I/O operations. In addition, its capability to spawn an isolated runtime helps build applications that perform significantly faster.

Modern applications require better performance; in 2023, better performance means better development. So one tip that I want to leave you with is to utilize worker threads to improve the performance of your microservices.

Use tools like Amplication to bootstrap your Node.js apps in just a few seconds, and incorporate worker threads to rapidly enhance your parallelism game in Node.js!

DEV Community