NodeJS Non-Blocking Processing
The problem
Let's say we have a queue and we need to constantly pull items from that. Which is the best approach to that?
In many cases, we can create a piece of code that contains a while
call, and for each iteration we pull one or more items from the queue.
function pullFromQueue(queue) {
while (true) {
const item = queue.pull();
doSomething(item);
}
}
But what if we have something else in our application? Or if we want to do another task too? That piece of code will block everything not relate to the doSomething()
call. Every single task will be frozen, just waiting that while
loop end. It will end, right?
Unless we get rid of that eternal loop, there is no way to accomplish such pulling process without blocking everything.
We can change doSomething()
to be an async
function, it will certainly unblock the event loop, but we just created a potential stackoverflow error. Every iteration will fire a doSomething()
call without waiting it to ends. Thus, it will continue to accumulate until it reaches the process limit calls, leading it to a real stackoverflow error.
Or we can change that to a recursive function, but it will still block the event loop, as it is almost the same as the the while
loop.
function pullFromQueue(queue) {
const item = queue.pull();
doSomething(item);
/* call itself */
pullFromQueue(queue);
}
The problem is: after we called pullFromQueue()
, every single event loop's tick will be in charge of managing it. Leading to a function flood where nothing but that function or something related to it will be executed. If we called something before, it will be frozen. And if we called something after, it won't even start.
The Solution
We aren't letting the event loop breath, and to remedy that we can use the setImmediate(our_call)
function. It will schedule our call to some more available position in the event loop. So, every frozen operation will have the opportunity to be executed in the between.
function pullFromQueue(queue) {
const item = queue.pull();
doSomething(item);
/* call itself */
setImmediate(() => pullFromQueue(queue));
}
Even taking some space in the event loop, the others calls will be executed. Let's try to visualize what is happening:
1. pullFromQueue() called;
2. queue.pull() called;
3. doSomething() called;
4. setImmediate scheduled pullFromQueue();
5. other random call();
6. another random call();
7. pullFromQueue() called;
8. ...
Now we see that there are some other operations being executed between the pullFromQueue()
calls, different of what was happening.
We can try to optimize it, said that this can still be event loop consuming. So, when a item is pulled from the queue, we can check if it exists. If not exist (meaning the queue is empty, busy, etc) we can re-schedule the function with a little delay.
function pullFromQueue(queue) {
const item = queue.pull();
/* if can't get item, re-schedule this function without calling doSomething() */
if (!item) {
return setInterval(() => pullFromQueue(queue), 250);
}
doSomething(item);
/* call itself */
setImmediate(() => pullFromQueue(queue));
}
This can reduce drastically the event loop consumption when it is not needed (queue empty, timeout, busy, etc).
Now that we have a non-blocking processing function, we can use it to whatever pulling processing we want, without worrying about with our other functionalities in the same NodeJS process.
We can apply that to anything that needs recursive fetching, like: HTTP Requests, Redis Job Queue Pulling, MongoDB Pagination Queries, etc.
Top comments (4)
With rescheduling when empty, consider:
Couldn't help but wonder as to why you made the proc function async? I ran this bit of code with and without it, but spotted no real difference. Just something eating away at my curiosity.
My "bad", it's a carryover from my other, simpler, comment.
There, awaiting
proc
was the single yield point in the loop, so it did make all the difference. Here there's a second await from thepull
inside the loop iteration, so processing can be made sync.However, and yet again my bad, it seems
await
ing a sync value still yields to the event loop as well. So in both examples it is unnecessary.I'll now edit both comments accordingly.
Any
await
point yields to the eventloop, much like usingsetImmediate
.Try the following:
You will get
started
printed before most of theprocessing
lines.