Simon Ström

Posted on Jan 31, 2021

Up and running with streams for performance and fun

#webdev #node #javascript #programming

One concept that you probably should familiarize yourself with is streams. Streams of data, that you either write, read, both, or transform. It is a really powerful tool in several ways, and I really enjoy using them in my Node code for several reasons. But keep in mind, although we will be using Node this is not in any way specific to Node. Data streams are equally important to use within any programming language.

This will not be super in-depth with advanced topics, but rather a high-level introduction to get you started with streams. Shall we go?

What is a stream?

Although I do not refer to streams as a streaming service, like Youtube or Netflix, these services actually use streams and are the perfect example for a beginner to start to understand what a stream is.

Take Netflix for example: when you click the button to play, the movie starts almost immediately. You don't need to sit and wait for the entire movie to be downloaded before it starts. Like you can imagine any anyone had to do before streaming services were a thing and people downloaded movies (so I heard they did at least).

This concept can, and when applicable should, be taken to your code. At least at any time, you can process the data before it is fully loaded. This can be: file parsing, converting, zipping, cryptography, or HTTP requests/responses.

Different type of streams

There are four different types of streams:

Readable: Well, they read data.
Writable: And yes, they write data.
Duplex: They both read and write data (like web sockets if you are familiar)
Transform: They are used to transform a stream of data before sending it forward. (They are actually duplex streams)

We will explore readable, writable, and transforms using a text file as our input data.

Readable and writable

Let us start with a readable and writable stream. The text file contains a data tab-separated, we simply want to transform this to a comma-separated file.

We can start with a stream that reads the data from the file:

import { createReadStream } from "fs"
const readableStream = createReadStream("./my-input-file.txt", "UTF-8")
readableStream.on("data", chunk => {
  /* TODO: Process input data */
})

It is actually straight forward, create a stream and attach an event listener to capture data. The data will be delivered in small bits and pieces, usually called chunks, and we can write our own function to process it.

One technical importance of readable streams is that they can have two modes paused and flowing.

When we add a "data" event listener the stream enters flowing mode, meaning the data will get to our code automatically as fast as the readable stream works. The paused mode will force you to request data from the stream using the read method of the stream to call for more data.

You can also move between these states but let us not get too in depth.

Let us continue with a write stream so we can transform our data and output it to disk:

import { createWriteStream } from "fs"
const writableStream = createWriteStream("./my-output-file.txt", "UTF-8")

It is pretty much the same procedure to instantiate it. Lets use the writableStream to write the transformed data:

import { createWriteStream, createReadStream } from "fs"
const readableStream = createReadStream("./my-input-file.txt", "UTF-8")
const writableStream = createWriteStream("./my-output-file.txt", "UTF-8")
readableStream.on("data", chunk => {
  writeableStream.write(chunk.replaceAll("\t", ","))
})

That is pretty much it to get started with reading and write streams.

Transforms and pipes

But hey! In the list of available stream types, there is a transform that should be used to transform the input stream and send it to another stream. Yep, that is correct. Let's have a look at that.

A transform can be far more complex than this, we will implement the least code needed for our use case:

import { Transform } from "stream"

const tabToCommaTransform = new Transform({
  decodeString: false,
  transform(chunk, encoding, callback) {
    this.push(chunk.replaceAll("\t", ","))
    callback()
  }
})

We create a new Transform object, and the actual transformation is implemented in the transform function property of that object. The input parameters will be a chunk, the data encoding, and a callback function to invoke when you are done. To pipe data forward you use the push method of the transform using this.push(data) and the data as a parameter.

The decodeString parameter ensures that the data will be kept as a string and not converted into a buffer.

But how do we use it? Now we have three streams that will do the work, a readable, a transform, and a writable. Enter pipes. With pipes you can chain several streams to produce your output, like this:

import { Transform } from "stream"
import { createWriteStream, createReadStream } from "fs"

const readableStream = createReadStream("./my-input-file.txt", "UTF-8")
const writableStream = createWriteStream("./my-output-file.txt", "UTF-8")
const tabToCommaTransform = new Transform({/**/})

readableStream.pipe(tabToCommaTransform).pipe(writableStream)

Now the data will automatically flow from the readable stream through our transform and out the writable stream. Great! There is actually a little nicer way to compose streams using the pipeline utility:

import { Transform, pipeline } from "stream"
import { createWriteStream, createReadStream } from "fs"
/* same as above */
pipeline(
  readableStream,
  tabToCommaTransform,
  writableStream,
  (err) => {
    if(err) {
      console.error(err)
      return
    }

    console.log("Conversion pipeline finished)
  }
)

And as of Node 15, there is a promise version:


import { Transform } from "stream"
import { pipeline } from "stream/promises"
import { createWriteStream, createReadStream } from "fs"
/* same as above*/

async function run() {
  await pipeline(
    readableStream,
    tabToCommaTransform,
    writableStream
  )
  console.log("Conversion pipeline finished")
}

run().catch(console.error)

HOLD YOUR HORSES! That code with transforms looks way more complicated than the first one. And yes that might be true. But what transform streams and pipes make possible is a whole other level of composability. And we will soon talk more about that...

The benefits

First, and foremost: PERFORMANCE. In several ways, but most important your application will be more memory efficient. Take this example, solving the same problem, for example:

import { readFile, writeFile, read } from "fs"
import { promisify } from "util"

const _readFile = promisify(readFile)
const _writeFile = promisify(writeFile)

async function convertData() {
  const data = await _readFile("./my-input-file.txt", "UTF-8")
  await _writeFile("./my-output-file.txt", data.replaceAll("\t", ","), "UTF-8")

  console.log("Conversion succesful")
}

convertData().catch(console.error)

How will this behave differently from our previous code? Well, for this code to work we will have to open the entire file before we can process the data. And then we will replace the tabs in that entire file. So, this code will consume a lot more memory. With streams, as we stated before, we will transform the file in chunks, piece by piece. That also means that we can transform files bigger than our available memory since we never need to keep the entire content in memory at the same time.

Another thing is the responsiveness of our application. If we want to run this code as a response to a web request, besides the memory consumption, the user will have to wait for us to load the file before we can send it. With streams, we can start the transfer as we read the file.

The other benefit, I already mentioned, is the way streams and especially transforms make our code composable. We can change the implementation and add features easily. Let's look at some included features that play well with our file stream and how easy we can add file compression and encryption to this example.

To pipe some file compression we just need to add this line to our pipeline

import { createBrotliCompress } from "zlib"
/* same as above  */
async function run() {
  const compress = createBrotliCompress()
  await pipeline(
    readableStream,
    tabToCommaTransform,
    compress,
    writableStream
  )
  console.log("Conversion pipeline finished")
}

run().catch(console.error)

You could also use the createGzip function exported from zlib to create a Gzip compression.

The encryption part is a bit more complicated since creating a crypto stream requires a few parameters, I will just use some example from the Node docs and promisify that a bit so we get the idea:

import { createCipheriv, randomFill, scrypt } from "crypto";
import { promisify } from "util";

const password = "secret"; //should be better
const salt = "salt"; // should probably be random and better
const _scrypt = promisify(scrypt);
const _randomFill = promisify(randomFill);

async function createEnpryptionStream() {
  const key = await _scrypt(password, salt, 24);
  const initializationVector = await _randomFill(new Uint8Array(16));
  return createCipheriv("aes-192-cbc", key, initializationVector);
}

And then we can just pipe that into our existing pipeline:

async function run() {
  const compress = createBrotliCompress()
  const encrypt = await createEnpryptionStream()
  await pipeline(
    readableStream,
    tabToCommaTransform,
    compress,
    createEncryptionStream,
    writableStream
  )
  console.log("Conversion pipeline finished")
}

run().catch(console.error)

Well, I think you get the idea now how streams make everything composable. Look at the pipeline and you will immediately get a high-level overview of what is happening here. And we can make changes, add and remove features without editing other pieces of code.

As with any abstractions in code, you should of course consider when to use a transform. They do add a bit of extra complexity, and for some one-off scripts, you might not need them.

Summary

So, streams are efficient and composable. Two words I really enjoy hearing when it comes to code. That is why I think streams are so fun and important to use.

Actually, I had not had my job if I did not know about streams. My work test was to build something that parses and sorts more data than available memory. At that time I did write most of my code in C#, but I must say streams in Node is really more of my cup of tea.

Please share if you have any thoughts on this or other resources with streams you find interesting!

Photo by Pietro Jeng on Unsplash

DEV Community

Up and running with streams for performance and fun

What is a stream?

Different type of streams

Readable and writable

Transforms and pipes

The benefits

Summary

Top comments (0)

Read next

Docker

Insertion Sort in Java (With Intuition + Dry run + Code)

How Well Does Azure Static Web Apps (SWA) Support Next.js? | Pages Architecture

Securing Next.js Applications with Robust CSP Headers