John Colagioia (he/him)

Posted on Apr 17, 2020 • Originally published at john.colagioia.net on Apr 17, 2020

Experimenting with Worker Threads

#programming #javascript #threads

As a quick note, I released this on my blog the other day and so it can get to be (as I tend to be) a bit rambling. One big change is that the blog version has an additional section at the end with a bunch of non-color design resources that I recommend. Oh, and the original text is on GitHub (licensed CC-BY-SA), so if anything seems muddy, by all means:

Leave a comment here,
Leave a comment on the blog,
File an issue on GitHub, or
Add a pull request!

As I have started working on a prototype desktop client for the twtxt social network, one of the key technical aspects is making a large number of web requests. Since I’m prototyping this using Proton Native and JavaScript is traditionally single-threaded, this presents a small problem: Since web requests can take a while to complete, traditional programming techniques would lock up the user interface, and that isn’t really viable.

Fortunately, as of Node.js v10.5.0, JavaScript on the desktop (such as Proton Native) has what they call worker threads, an approach to forcing JavaScript to perform multiple tasks at (approximately) the same time.

So, these are some quick notes on getting worker threads…well, working. It was easy enough to make it work, but there are some points where it’s unclear what is supposed to happen, with “minimal example” code all having strange and unnecessary features.

Threads, in General

Originally, Sun Microsystems created what they called “light-weight processes,” a system where multiple code-paths can run in parallel within the same program or processes. As other languages implemented similar approaches, the term evolved into “threads.”

If multiple threads are run under the same process, this typically gives benefits over a multi-process approach with interprocess communication, since most of the system state can be shared, saving overhead on context switches and thread creation. If you haven’t taken an operating systems course and don’t recognize those terms, they basically boil down to not needing to keep pausing and restarting programs, since everything should be running from the same package.

Generally speaking, threads have a handful of common operations:

Create sets up the new thread and assigns it a workload and initial data to work with.
Exit ends the thread from the inside, leaving the data to be harvested by the main program.
Join takes the data from the ended thread to make it available to the main program.

That’s not the entire model, of course. There are a lot of utility features allowing the programmer to set different parameters and retrieve information, but the core process is create-exit-join.

Worker Threads

Node’s worker threads…aren’t that.

In some ways, it makes sense. The standard approach to threading goes back to the early 1990s, and it’s now almost thirty years later, so maybe we’ve learned some things that make life easier. And then again…well, we’ll see.

Thread Creation

We launch a thread almost normally, but with a twist that makes me extremely suspicious about how this all works under the covers.

const { Worker } = require('worker_threads');
const worker = new Worker(
  './workercode.js',
  {
    workerData: someObjectWithInitialData,
  }
);

Typically, threads are given functions to run. Worker threads are different, though, taking a file. This is where suspicion starts to come in, as sending execution to a separate file implies that the thread is a separate program, rather than a single program sharing state.

Thread Handlers

The worker thread has three events we can chose to handle.

worker.on('message', this.acceptUpdate);
worker.on('error', this.reportUpdateError);
worker.on('exit', this.reportExit);

Each handler function takes a single parameter. The message can be an arbitrary object. The error is a JavaScript Error object. The exit code is an integer.

There is also an online handler, announcing when the thread has started execution, taking no parameters, if that’s useful to you.

Returning Data

Worker threads don’t really exit and join, though I suppose an exit value could be used to simulate that. Instead, the thread takes its initial state from a default workerData variable (imported from the worker_threads library) and sends messages back to the main thread.

const {
  parentPort,
  workerData,
} = require('worker_threads');
parentPort.postMessage(someObjectWithResults);

The message handler (acceptUpdate(), in the example above) then receives a copy of someObjectWithResults.

This also works in the opposite direction, with the main thread sending messages to the worker.

worker.postMessage(updateForTheThread);

These are surprising improvements over traditional threading libraries, because it allows the thread to easily send and receive updates whenever it gets them instead of waiting until it’s out of work to return everything it has collected or messing around in shared memory. However, this still smells of running in a separate process, basically treating the thread as a peer to coordinate with across a network connection or a special kind of shared file called a “pipe” that I won’t bother to discuss, here.

Join

All that said, we still get a traditional join operation, where the main thread can harvest data from the worker.

worker.getHeapSnapshot();

This call fails unless the thread has exited, meaning that it’s best run in the exit handler (reportExit(), in the example above), and makes the worker threads feel less like a separate process.

Going Further

So, after all that, I’m still not 100% convinced that worker threads are actually threads, but they seem to mostly do the job and that’s mostly what matters.

There’s actually a lot more available, here, too. The threads can communicate through console I/O. A thread can set up additional communications channels, which can be passed to the parent for another thread, allowing two worker threads to communicate directly. Ports (endpoints to a communications channel) can be manipulated to prevent the thread from exiting, and so forth.

Like I said, though, we have our basic create-exit-join model plus communication back and forth, which is fairly useful for a lot of kinds of work. If they’re not “really” threads, it doesn’t matter much, as long as the code doesn’t block and they basically act like threads.

Credits : The header image is Threads by Dave Gingrich and made available under the terms of the Creative Commons Attribution Share-Alike 2.0 Generic license.

DEV Community