Today I realized that in Node.js, neither cluster.fork
or child_process.fork
act like something you expect in a C environment. Actually, it is shortly mentioned in docs:
Unlike the fork(2) POSIX system call,
child_process.fork()
does not clone the current process.The
child_process.fork()
method is a special case ofchild_process.spawn()
used specifically to spawn new Node.js processes. Likechild_process.spawn()
, aChildProcess
object is returned. The returnedChildProcess
will have an additional communication channel built-in that allows messages to be passed back and forth between the parent and child.
So what it means?
Taking a simple C code that forks 5 processes:
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
int main()
{
printf("Shared Code\n");
for (int i = 0; i < 5; i++)
{
int pid = fork();
if (!pid)
{
return printf("Worker %d\n", i+1);
}
}
return 0;
}
Compiling and running this code gives us a result like this:
Shared
Worker 1
Worker 2
Worker 3
Worker 4
Worker 5
What operating system does under the hoods, is when we call fork(), it copies entire process state to a new one with a new PID. Return value in the worker process is always 0
so we have a way to find-out if rest of the code is running in forked process or master. (Thanks to @littlefox comment๐งก)
The important point is that forked process continues from where fork()
was called. Not from the beginning so Shared
is printed once.
Running a similar code in Node.js:
const { fork, isMaster } = require('cluster')
console.log('Shared')
if (isMaster) {
// Fork workers.
for (let i = 0; i < 5; i++) {
fork();
}
} else {
// Worker
console.log(`Worker`);
}
The output is amazingly different:
Shared
Shared
Worker
Shared
Worker
Shared
Worker
Shared
Worker
Shared
Worker
The point is that each time a worker forked, it started with a fresh V8 instance. This is not a behavior that it's name tells. Fork in Node.js is actually doing exec/spawn which causes shared code running each time.
OK. So let's move console.log('Shared')
into if (isMaster)
:P
Well. Yes. You are right. That's the solution. But just for this example case!
In a real-world application that needs a cluster, we don't immediately fork workers. We may want to set up our web framework, parse CLI args and require a couple of libs and files. All of this steps has to be repeated on each worker that may introduce lots of unnecessary overhead.
Final Solution
Now that we know what exactly cluster.fork
does under the hood, we can split our worker logic into a seperate worker.js
file and change default value of exec
which is process.argv[1]
to worker.js
:) This is possible by calling cluster.setupMaster()
on master process.
Top comments (4)
"Return value in the master process is always 0"
You mixed that up. Return value in the original process ("master") is the PID of the new process, the new process gets 0 as return value.
Ref manpages.ubuntu.com/manpages/bioni...
Yeah. You are right. I mixed up example code and blindly continued description based on it :D Thanks. Post Updated.
The copy-on-write behavior still there, especially when you're running NodeJS under Linux-based machine. I already post the technical explanation at StackOverflow.
So the take home message is that Node spawns new processes rather than copying the current process? I'm using fork from child_process, and I expect it to create a new process with the file path that I pass into it, which it does. I guess that seems more straightforward than what you're describing with cluster.fork, which seems more specific to cluster based computing. I think it's quite common to use the if(master) check when you're doing cluster computing, because you're loading the same program on every node.