DEV Community

Pooya Parsa
Pooya Parsa

Posted on • Updated on

Node.js fork is not what you think!

Today I realized that in Node.js, neither cluster.fork or child_process.fork act like something you expect in a C environment. Actually, it is shortly mentioned in docs:

Unlike the fork(2) POSIX system call, child_process.fork() does not clone the current process.

The child_process.fork() method is a special case of child_process.spawn() used specifically to spawn new Node.js processes. Like child_process.spawn(), a ChildProcess object is returned. The returned ChildProcess will have an additional communication channel built-in that allows messages to be passed back and forth between the parent and child.

So what it means?

Taking a simple C code that forks 5 processes:

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>

int main()
{
   printf("Shared Code\n");

   for (int i = 0; i < 5; i++)
   {
      int pid = fork();
      if (!pid)
      {
       return printf("Worker %d\n", i+1);
      }
   }

   return 0;
}

Compiling and running this code gives us a result like this:

Shared
Worker 1
Worker 2
Worker 3
Worker 4
Worker 5

What operating system does under the hoods, is when we call fork(), it copies entire process state to a new one with a new PID. Return value in the worker process is always 0 so we have a way to find-out if rest of the code is running in forked process or master. (Thanks to @littlefox comment🧡)

The important point is that forked process continues from where fork() was called. Not from the beginning so Shared is printed once.

Running a similar code in Node.js:


const { fork, isMaster } = require('cluster')

console.log('Shared')

if (isMaster) {
  // Fork workers.
  for (let i = 0; i < 5; i++) {
    fork();
  }
} else {
  // Worker
  console.log(`Worker`);
}

The output is amazingly different:

Shared
Shared
Worker
Shared
Worker
Shared
Worker
Shared
Worker
Shared
Worker

The point is that each time a worker forked, it started with a fresh V8 instance. This is not a behavior that it's name tells. Fork in Node.js is actually doing exec/spawn which causes shared code running each time.

OK. So let's move console.log('Shared') into if (isMaster) :P

Well. Yes. You are right. That's the solution. But just for this example case!

In a real-world application that needs a cluster, we don't immediately fork workers. We may want to set up our web framework, parse CLI args and require a couple of libs and files. All of this steps has to be repeated on each worker that may introduce lots of unnecessary overhead.

Final Solution

Now that we know what exactly cluster.fork does under the hood, we can split our worker logic into a seperate worker.js file and change default value of exec which is process.argv[1] to worker.js :) This is possible by calling cluster.setupMaster() on master process.

Top comments (4)

Collapse
 
littlefox profile image
Mara Sophie Grosch (LittleFox) • Edited

"Return value in the master process is always 0"

You mixed that up. Return value in the original process ("master") is the PID of the new process, the new process gets 0 as return value.

Ref manpages.ubuntu.com/manpages/bioni...

Collapse
 
pi0 profile image
Pooya Parsa

Yeah. You are right. I mixed up example code and blindly continued description based on it :D Thanks. Post Updated.

Collapse
 
thorx86 profile image
Athaariq Ardhiansyah

What operating system does under the hoods, is when we call fork(), it copies entire process state to a new one with a new PID.

The copy-on-write behavior still there, especially when you're running NodeJS under Linux-based machine. I already post the technical explanation at StackOverflow.

Collapse
 
blayman profile image
Brett

So the take home message is that Node spawns new processes rather than copying the current process? I'm using fork from child_process, and I expect it to create a new process with the file path that I pass into it, which it does. I guess that seems more straightforward than what you're describing with cluster.fork, which seems more specific to cluster based computing. I think it's quite common to use the if(master) check when you're doing cluster computing, because you're loading the same program on every node.