swyx

Posted on Sep 27, 2020 • Edited on Sep 30, 2020 • Originally published at swyx.io

5 TILs about Node.js Fundamentals from the Node.js Design Patterns Book

#javascript #node #todayilearned

I started reading Node.js Design Patterns this week. I got the Third Edition, and have not spent any time looking into what's changed from prior editions. The first 6 chapters cover fundamental knowledge, before getting into the meaty named Design Patterns, so these notes are from that first "half" of the book.

1. `libuv` and the Reactor Pattern

libuv is something I've often heard about as a low level Node.js library, but now I have a glimpse of what it does for us. As the book says:

Libuv represents the low-level I/O engine of Node.js and is probably the most important component that Node.js is built on. Other than abstracting the underlying system calls, libuv also implements the reactor pattern, thus providing an API for creating event loops, managing the event queue, running asynchronous I/O operations, and queuing other types of task.

The Reactor pattern, together with demultiplexing, event queues and the event loop, is core to how this works - a tightly coordinated dance of feeding async events into a single queue, executing them as resources free up, and then popping them off the event queue to call callbacks given by user code.

2. Module Design Patterns

I am superficially familiar with the differences between CommonJS modules and ES Modules. But I liked the explicit elaboration of 5 module definition patterns in CommonJS:

Named exports: exports.foo = () => {}
Exporting a function: module.exports = () => {}
Exporting a class: module.exports = class Foo() {}
Exporting an instance: module.exports = new Foo() which is like a singleton, except when it is not because of multiple instances of the same module.
Monkey patching other modules (useful for nock)

In ES Modules, I enjoyed the explanation of "read-only live bindings", which will look weird to anyone who has never seen it and has always treated modules as stateless chunks of code:

// counter.js
export let count = 0
export function increment () {
   count++ 
}

// main.js
import { count, increment } from './counter.js'
console.log(count) // prints 0
increment()
console.log(count) // prints 1
count++ // TypeError: Assignment to constant variable!

This mutable module internal state pattern is endemic in Svelte and Rich Harris' work and I enjoy how simple it makes code look. I don't know if there are scalability issues with this pattern but so far it seems to work fine for ES Modules people.

The last important topic I enjoyed was ESM and CJS interop issues. ESM doesn't offer require, __filename or __dirname, so you have to reconstruct them if needed:

import { fileURLToPath } from 'url'
import { dirname } from 'path'
const __filename = fileURLToPath(import.meta.url) 
const __dirname = dirname(__filename)

import { createRequire } from 'module'
const require = createRequire(import.meta.url)

ESM also cannot natively import JSON, as of the time of writing, whereas CJS does. You can work around this with the require function from above:

import { createRequire } from 'module'
const require = createRequire(import.meta.url) 
const data = require('./data.json') 
console.log(data)

Did you know that? I didn't!

In semi-related news - Node v14.13 will allow named imports from CJS modules - probably the last step in ESM "just working" in Node.js

3. Unleashing Zalgo

APIs are usually either sync or async in Node.js, but TIL you can design APIs that are both:

function createFileReader (filename) { 
  const listeners = [] 
  inconsistentRead(filename, value => {
    listeners.forEach(listener => listener(value)) 
  })
  return {
    onDataReady: listener => listeners.push(listener) 
  }
}

This looks innocent enough, except when you use it as async and then sync:

const reader1 = createFileReader('data.txt')  // async
reader1.onDataReady(data => {
   console.log(`First call: ${data}`)
   const reader2 = createFileReader('data.txt')  // sync
   reader2.onDataReady(data => {
     console.log(`Second call: ${data}`) 
   })
})
// only outputs First call - never outputs Second call

This is because module caching in Node makes the first call async and the second call sync. izs famously called this "releasing Zalgo" in a blogpost.

You can keep Zalgo caged up by:

using direct style functions for synchronous APIs (instead of Continuation Passing Style)
make I/O purely async by only using async APIs, using CPS, and deferring synchronous memory reads by using process.nextTick()

The same line of thinking can also be done for EventEmitter Observers as it is for Callbacks.

Callbacks should be used when a result must be returned in an asynchronous way, while events should be used when there is a need to communicate that something has happened.

You can combine both the Observer and Callback patterns, for example with the glob package which takes both a callback for its simplier, critical functionality and a .on for advanced events.

A note on ticks and microtasks:

process.nextTick sets up a microtask, which executes just after the current operation and before any other I/O
whereas setImmediate runs after ALL I/O events have been processed.
process.nextTick executes earlier, but runs the risk of I/O starvation if takes too long.
setTimeout(callback, 0) is yet another phase behind setImmediate.

4. Managing Async and Limiting Concurrency with `async`

It's easy to spawn race conditions and accidentally launch unlimited parallel execution bringing down the server, with Node.js. The Async library gives battle tested utilities for defining and executing these issues, in particular, queues that offer limited concurrency.

The book steps you through 4 versions of a simple web spider program to develop the motivations for requiring managing async processes and describe the subtle issues that present themselves at scale. I honestly cant do it justice, I didn't want to just copy out all the versions and discussions of the web spider project as that is a significant chunk of the book, you're just gonna have to read thru these chapters yourself.

5. Streams

I've often commented that Streams are the best worst kept secret of Node.js. Time to learn them. Streams are more memory and CPU efficient than full buffers, but they are also more composable.

Each stream is an instance of EventEmitter, streaming either binary chunks or discrete objects. Node offers 4 base abstract stream classes:

Readable (where you can read in flowing (push) or paused (pull) mode)
Writable - you're probably familiar with res.write() from Node's http module
Duplex: both readable and writable
Transform: a special duplex stream with two other methods: _transform and _flush, for data transformation
PassThrough: a Transform stream that doesnt do any transformation - useful for observability or to implement late piping and lazy stream patterns.

import { PassThrough } from 'stream'
let bytesWritten = 0
const monitor = new PassThrough() 
monitor.on('data', (chunk) => {
  bytesWritten += chunk.length 
})
monitor.on('finish', () => { 
  console.log(`${bytesWritten} bytes written`)
})
monitor.write('Hello!') monitor.end()

// usage
createReadStream(filename)
 .pipe(createGzip())
 .pipe(monitor) // passthrough stream!
 .pipe(createWriteStream(`${filename}.gz`))

izs recommends minipass which implement a PassThrough stream with some better features. Other useful stream utils:

https://github.com/maxogden/mississippi
https://www.npmjs.com/package/streamx
You can make streams lazy (create proxies for streams, so the stream instance isn't until some piece of code is consuming) with lazystream.

Although the authors do recommend that piping and error handling be best organized with the native stream.pipeline function.