DEV Community

Cover image for Node.JS File Streams
Thomas Pegler
Thomas Pegler

Posted on

Node.JS File Streams

So you want to handle large files. Perhaps you have a web app that accepts file uploads but you want to be able to let users upload several gigabytes but don't want to choke the server. What's to be done?

Streams to the rescue!

You might well be aware of Node's file streaming capabilities, I was not until I had to allow an unknown file size limit. I wanted to use the fantastic HTTP library got to stream a file from a remote server as specified by an API call. What was the best way to handle this? Well, if you're using Node and Express, it's a combination of node:fs createWriteStream and Multer. But for general purposes, the write stream will do just fine.

Streams in Node.js

In Node, there are a few types of streams. Below I have a simple example for two, Readable and Writeable.

import { createWriteStream, createReadStream, pipeline, Readable } from 'node:fs';

const writer = createWriteStream( 'path/to/your/new/large.file' );
const reader = createReadStream( 'path/to/your/saved/large.file' );

reader.on( 'data', ( chunk ) => {
    console.log( chunk.toString() );
} );

writer.on( 'finish', () {
    console.log( 'Finished writing a large file!' );
} );

reader.on( 'finish', () => {
    console.log( 'Finished reading' );
} );

reader.pipe( writer );

// Or, alternatively

pipeline(
    reader,
    writer
);

// Explicilty close the reader and writer
reader.close();
writer.close();
Enter fullscreen mode Exit fullscreen mode

There are two other types of stream called Duplex and Transform. These are streams that function both as a readable and writable stream, the difference being that Transform streams, you guessed it, transform the data passed to it. They can take in data in the form of a file or a Buffer like the readable stream but they can also output data like a writeable stream. This can be useful for things like data manipulation (i.e. encrypting files). Here's how you'd do it with a readStream and a writeStream individually.

import { createReadStream, createWriteStream } from 'node:fs';
import crypto from 'crypto';

const infile = createReadStream( 'path/to/in/file' );
const outfile = createWriteStream( 'path/to/out/file' );
const secretKey = process.env.SECRET_KEY;
const iv = crypto.randomBytes( 16 );
const encrypt = crypto.createCipher( 'aes-256-cbc', Buffer.from( secretKey ), iv );
const size = fs.statSync( 'path/to/in/file' ).size;

infile.on( 'data', ( data ) => {
    const percentage = parseInt( infile.bytesRead ) / parseInt( size ) );
    console.log( `${percentage * 100}%` );
    const encrypted = encrypt.update( data );

    if ( encrypted ) {
        console.log( encrypted );
        outfile.write( encrypted );
    }
} );

infile.on( 'close', () => {
    outfile.write( encrypt.final() );
    outfile.close();
} );
Enter fullscreen mode Exit fullscreen mode

And now to use an actual transform stream.

import { Transform, TransformCallback } from 'stream';

export class Base64DecodeStream extends Transform {
    constructor() {
        super( { decodeStrings: false } );
        this.extra = '';
    }

    _transform( chunk: Buffer | string, encoding: BufferEncoding, cb: TransformCallback ) {
        let c = `${chunk}`;
        c = this.extra + c.replace( /(\r\n|\n|\r)gm, '' );

        const remaining = c.length % 4;

        this.extra = c.slice( c.length - remaining );
        c = c.slice( 0, c.length - remaining );

        const buf = Buffer.from( c, 'base64' );
        this.push( buf );
        cb();
    }

    _flush( cb: TransformCallback ) {
        if ( this.extra.length {
            this.push( Buffer.from( this.extra, 'base64' );
        };

        cb();
    }
}
Enter fullscreen mode Exit fullscreen mode

And you can use this above in the same manner as the other pipes. More specifically, since it is a Transform, it can be used between streams such as a response from an Axios request like:

response.pipe( new Base64DecodeStream() ).pipe( decipher ).pipe( stream );
Enter fullscreen mode Exit fullscreen mode

Using streams in an Express app

Now for something real-world that you can use. A simple Multer middleware for saving form-data files without loading them entirely into memory.

// Node Modules
import { randomUUID } from 'crypto';
import { Request } from 'express';
import multer, { StorageEngine } from 'multer';
import { Writable } from 'stream';
import { createWriteStream, existsSync, mkdirSync } from 'fs';
import { log } from '../common/winston.js';

/**
 * Extended Multer storage class to save a file from a stream.
 *
 * The idea is that this allows for large file uploads to the API without strangling the system.
 */
export class MulterStorage implements StorageEngine {
  mediaRoot: string;

  destination: string;

  /**
   * Constructor method.
   * @param {string} mediaRoot
   */
  constructor( mediaRoot: string ) {
    this.mediaRoot = mediaRoot;
    this.destination = mediaRoot;
  }

  /**
   * Used to send the request filename to a callback function.
   * @param {Request} req
   * @param {Express.Multer.File} file
   * @param {function} cb
   */
  // eslint-disable-next-line class-methods-use-this
  filename(
    req: Request,
    file: Express.Multer.File,
    cb: ( error?: Error | string | null, filename?: string ) => void,
  ) {
    cb( null, file.originalname );
  }

  /**
   * Handle the file upload
   * @param {Request} req - The request object
   * @param {Express.Multer.File} file - The uploaded file
   * @param {function} cb
   */
  _handleFile(
    req: Request,
    file: Express.Multer.File,
    cb: ( error?: Error | string | null | unknown, info?: Partial<Express.Multer.File> ) => void,
  ) {
    const uuid = randomUUID();

    // Begin handling file
    try {
      if ( !existsSync( `${this.destination}/${uuid}` ) ) {
        mkdirSync( `${this.destination}/${uuid}`, { recursive: true } );
      }

      const intermediateFile: Partial<Express.Multer.File> = {
        path: `${this.destination}/${uuid}/${file.originalname.replace( /\s/g, '_' )}`,
      };
      const finalFile: Partial<Express.Multer.File> = { ...file, ...intermediateFile };
      const writeStream: Writable = createWriteStream( `${finalFile.path}` );
      const fileReadStream = file.stream;

      fileReadStream
        .pipe( writeStream )
        .on( 'finish', () => {
          cb( null, finalFile );
        } )
        .on( 'error', ( e ) => {
          writeStream.end();
          cb( e );
        } );
    } catch ( e ) {
      if ( typeof e === 'string' ) {
        log( 'error', e.toUpperCase() );
      } else if ( e instanceof Error ) {
        log( 'error', e.message );
      }
      cb( e );
    }
  }

  /**
   * Method to remove a file.
   * @param {Request} req - the request object
   * @param {Express.Multer.File} file - The specified file to remove
   * @param {function} callback
   */
  // eslint-disable-next-line class-methods-use-this
  _removeFile(
    req: Request,
    file: Express.Multer.File,
    callback: ( error: ( Error | null ) ) => void,
  ): void {
  }
}

export const upload = multer( { storage: new MulterStorage( '/media' ) } );
Enter fullscreen mode Exit fullscreen mode

Then on your route, you can do something like this:

import {Request, Response, Router, NextFunction, express} from 'express';

import {upload} from '../../../middleware/multer-storage.js';

const api = Router();

api.post(
    '/api/:version/',
    upload.single('file'),
    // This is an example of how to upload several different files with different names
    //
    // upload.fields([{
    //   name: 'video', maxCount: 1
    // }, {
    //   name: 'subtitles', maxCount: 1
    // }])
    async (req: Request, res: Response, next: NextFunction) => {
        const file = req.file;

        const data = req.body as EncodeBody;

        // Note the ? after file. Since the file should
        // exist if saved correctly, but might not, we should
        // check or allow nullable in the `process` function
        const resp = await process(data, file?.path);
        return next();
    },
);

const app = express();
app.use( '', api );
Enter fullscreen mode Exit fullscreen mode

The string passed to single will be the name of the file on the Request object. For single there will be just one but you can add multiple and specify more information (like a maxCount) by using the fields method as can be seen above.

Conclusion

Using all of the above combined you can safely store, access and download large files without breaking your memory limit.

Top comments (0)