Pooya Parsa

Posted on Jan 24, 2019 • Edited on Jan 26, 2019

Node.js fork is not what you think!

#javascript #node #linux

Today I realized that in Node.js, neither cluster.fork or child_process.fork act like something you expect in a C environment. Actually, it is shortly mentioned in docs:

Unlike the fork(2) POSIX system call, child_process.fork() does not clone the current process.

The child_process.fork() method is a special case of child_process.spawn() used specifically to spawn new Node.js processes. Like child_process.spawn(), a ChildProcess object is returned. The returned ChildProcess will have an additional communication channel built-in that allows messages to be passed back and forth between the parent and child.

So what it means?

Taking a simple C code that forks 5 processes:

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>

int main()
{
   printf("Shared Code\n");

   for (int i = 0; i < 5; i++)
   {
      int pid = fork();
      if (!pid)
      {
       return printf("Worker %d\n", i+1);
      }
   }

   return 0;
}

Compiling and running this code gives us a result like this:

Shared
Worker 1
Worker 2
Worker 3
Worker 4
Worker 5

What operating system does under the hoods, is when we call fork(), it copies entire process state to a new one with a new PID. Return value in the worker process is always 0 so we have a way to find-out if rest of the code is running in forked process or master. (Thanks to @littlefox comment🧡)

The important point is that forked process continues from where fork() was called. Not from the beginning so Shared is printed once.

Running a similar code in Node.js:


const { fork, isMaster } = require('cluster')

console.log('Shared')

if (isMaster) {
  // Fork workers.
  for (let i = 0; i < 5; i++) {
    fork();
  }
} else {
  // Worker
  console.log(`Worker`);
}

The output is amazingly different:

Shared
Shared
Worker
Shared
Worker
Shared
Worker
Shared
Worker
Shared
Worker

The point is that each time a worker forked, it started with a fresh V8 instance. This is not a behavior that it's name tells. Fork in Node.js is actually doing exec/spawn which causes shared code running each time.

OK. So let's move `console.log('Shared')` into `if (isMaster)` :P

Well. Yes. You are right. That's the solution. But just for this example case!

In a real-world application that needs a cluster, we don't immediately fork workers. We may want to set up our web framework, parse CLI args and require a couple of libs and files. All of this steps has to be repeated on each worker that may introduce lots of unnecessary overhead.

Final Solution

Now that we know what exactly cluster.fork does under the hood, we can split our worker logic into a seperate worker.js file and change default value of exec which is process.argv[1] to worker.js :) This is possible by calling cluster.setupMaster() on master process.

Top comments (4)

Mara Sophie Grosch (LittleFox) • Jan 26 '19 • Edited

"Return value in the master process is always 0"

You mixed that up. Return value in the original process ("master") is the PID of the new process, the new process gets 0 as return value.

Ref manpages.ubuntu.com/manpages/bioni...

Pooya Parsa • Jan 26 '19

Yeah. You are right. I mixed up example code and blindly continued description based on it :D Thanks. Post Updated.

Athaariq Ardhiansyah • Jan 13 '22

What operating system does under the hoods, is when we call fork(), it copies entire process state to a new one with a new PID.

The copy-on-write behavior still there, especially when you're running NodeJS under Linux-based machine. I already post the technical explanation at StackOverflow.

Brett • Oct 25 '19

So the take home message is that Node spawns new processes rather than copying the current process? I'm using fork from child_process, and I expect it to create a new process with the file path that I pass into it, which it does. I guess that seems more straightforward than what you're describing with cluster.fork, which seems more specific to cluster based computing. I think it's quite common to use the if(master) check when you're doing cluster computing, because you're loading the same program on every node.

DEV Community

Node.js fork is not what you think!

So what it means?

OK. So let's move `console.log('Shared')` into `if (isMaster)` :P

Final Solution

Top comments (4)

Read next

Unlocking the Power of Web Components and Custom Elements for Reusable UI Design

Why TypeScript is the Future of JavaScript Development: Benefits and Adoption

Boost Your Web App's Speed: JavaScript Performance Optimization Techniques

WebSocket Integration in React for Real-Time Communication

So what it means?

OK. So let's move console.log('Shared') into if (isMaster) :P

Final Solution

Read next

Unlocking the Power of Web Components and Custom Elements for Reusable UI Design

Why TypeScript is the Future of JavaScript Development: Benefits and Adoption

Boost Your Web App's Speed: JavaScript Performance Optimization Techniques

WebSocket Integration in React for Real-Time Communication

OK. So let's move `console.log('Shared')` into `if (isMaster)` :P