DEV Community

Cover image for Uploading multiple files at the same time using multithreading in NodeJS
Wesley Miranda
Wesley Miranda

Posted on • Edited on

Uploading multiple files at the same time using multithreading in NodeJS

NodeJS is well known for being single-threading, but it is not true, because only the event-loop is handled by a single thread. NodeJS gives to us the possibility to use 2 approaches for multithreading, worker_threads and child_process.

  • worker_threads are controlled by a process and have shared memory, which makes communication between them easier.
  • child_process are processes generated from the main thread, it's useful when we need direct communication with the Operational System, but we need more memory to create them.

Application: The purpose of the application we are going to create is to load all the folder's content and upload every file to Google Cloud Storage, but the most interesting part is that we will decide how many threads should do this operation, speeding up the upload process.

Main Principles:

  • Worker threads: NodeJS module to create threads.
  • Streams: if you don't know NodeJS streams I suggest you take a look at the tutorial that I've created here.
  • File System: NodeJS provides us with a simple way to access the OS and manipulate files and folders.

Steps to reproduce:

  • Load the folder's content
  • Create the threads
  • Create the upload worker
  • Assign the threads to upload worker

Requirements:

  • We are going to use NodeJS 16.16 version

  • You need to have a Google account to access Google Cloud Services.


Cloud Storage Service

We need to install cloud storage library to deal with google cloud service.

npm install @google-cloud/storage
Enter fullscreen mode Exit fullscreen mode

If you need help with configuring your Cloud Storage service on your Google account, many good tutorials can help you with that, it's not the purpose of this tutorial.

Let's create our first file cloudStorageFileService.jsto work with our storage.

cloudStorageFileService.js

const { Storage } = require('@google-cloud/storage')
const path = require('path')
const serviceKey = path.join(__dirname, '../gkeys.json')


class CloudStorageFileService {

// (1)
    constructor() {
        this.storage = new Storage({
            projectId: 'my-project-id',
            keyFilename: serviceKey
        })
    }

// (2)
    async uploadFile(bucketName, destFileName) {
        return await this.storage
            .bucket(bucketName)
            .file(destFileName)
            .createWriteStream()
    }
}

module.exports = CloudStorageFileService
Enter fullscreen mode Exit fullscreen mode

From the code sections above:

  1. Basic configurations to use Cloud Storage service, as the project id and the path with your Google Cloud credentials.

  2. Google Cloud Storage provides us a Writable Stream for uploading files.


Thread Controller

The thread controller will handle the thread distribution, we want to give a thread for each file, and upload them separately.

threadController.js

const {
    Worker
} = require('node:worker_threads');
const { readdir } = require('fs/promises')
const path = require('path')


class ThreadController {

// (1)
    constructor(threadsNumber) {
        this.files = []
        this.threadsNumber = threadsNumber
        this.count = 0
    }

// (2)
    async loadFiles() {
        this.files = await readdir(path.join(__dirname, '/content'))
    }

// (3)
    async uploadThread(filePath) {
        return new Promise((resolve, reject) => {
            const worker = new Worker('./fileUploadWorker.js', {
                workerData: {
                    file: filePath
                }
            });
            worker.once('error', reject);
            worker.on('exit', (code) => {
                resolve(filePath)
            });
        })
    }

// (4)
    async execute() {
        const init = performance.now()

        await this.loadFiles()

        let promises = []

        while (this.count < this.files.length) {

            for (let i = this.count; i < this.count + this.threadsNumber; i++) {
                if (this.files[i]) {
                    promises.push(this.uploadThread(this.files[i]))
                }
            }

            const result = await Promise.all(promises)

            promises = []
            this.count += this.threadsNumber

            console.log(result)
        }

        const end = performance.now()
        console.log(end - init)
    }
}

module.exports = ThreadController

Enter fullscreen mode Exit fullscreen mode

From the code sections above:

  1. Initializing our three main parameters, the number of threads that we want, the files we want to upload, and the counter for created threads.

  2. Load all the files contained into the folder we want to process.

  3. Here we are sending a message with the right file path to the worker thread and waiting until the thread finishes its process using a Worker object.

  4. Running everything together. Now we are giving a thread for each file until there are no files to process. for example if there are 5 files and we pass 3 threads to process, at the first time It will process the first 3 files and at the second time will process the 2 files remaining. Also, I put a performance meter to test the behavior with a different number of threads.


File Upload Worker

The upload worker is the representation of the thread as a code, here we are going to put all we want that the thread does.

fileUploadWorker.js

const {
  isMainThread, parentPort, workerData
} = require('node:worker_threads');
const path = require('path')
const { pipeline } = require('stream/promises')
const { createReadStream } = require('fs')
const CloudStorageFileService = require('./cloudStorageFileService');

class FileUploadWorker {

// (1)
  constructor() {
    this.storage = new CloudStorageFileService()
    this.filePath = path.join(__dirname, '/content/', workerData.file)
    this.fileName = workerData.file
  }

// (2)
  async upload() {
    if (!isMainThread) {
      await pipeline(createReadStream(this.filePath), await this.storage.uploadFile('myfileuploads', this.fileName))
    }
  }
}

// (3)
;
(async () => {
  const fileUploader = new FileUploadWorker()
  await fileUploader.upload()
})()


Enter fullscreen mode Exit fullscreen mode

From the code sections above:

  1. In the constructor we need to initialize the Storage service or we could receive it as a parameter. Also, we need to get the file path from the parent thread through the workerData.

  2. Here we check if we are in a thread dynamically created by us or in the NodeJS main thread. If we are not in the main thread we create a Readable Stream object from the file and upload it.

  3. This anonymous function is responsible for executing our created thread.


Executing Everything

To test our application I will put 9 threads, one for each file in my folder. You can experiment with other values to measure the performance.

fileUploadWorker.js

const ThreadController = require('./threadController');

const controller = new ThreadController(9)
    ;
(async () => {
    await controller.execute()
})()

Enter fullscreen mode Exit fullscreen mode

Takeaways

  • NodeJS is not single threading.
  • Threads are handy when you need to process a heavy job and don't want to crash the NodeJS main thread.
  • We also can use multithreading for batch jobs.

You can take a look at the entire code here

Top comments (13)

Collapse
 
miketalbot profile image
Mike Talbot ⭐

You don't need multiple threads to upload multiple files in parallel in Node - the single threaded part is only on the instruction pipeline, you could easilly issue 100 uploads and have them run in parallel using a Promise.all() - the Async library has lots of useful calls to batch up things too if you don't just want to start all at once.

Multiple threads and processes are very handy if you are actually doing processing in your Javascript code, where other operations would be blocked.

Collapse
 
wesleymreng7 profile image
Wesley Miranda

Makes sense! thanks for your comment!

Collapse
 
thomasbnt profile image
Thomas Bnt

Hello ! Don't hesitate to put colors on your codeblock like this example for have to have a better understanding of your code 😎

console.log('Hello world!');
Enter fullscreen mode Exit fullscreen mode

Example of how to add colors and syntax in codeblocks

Collapse
 
wesleymreng7 profile image
Wesley Miranda

Thanks a lot! I didnt know that

Collapse
 
vigneshpa profile image
vignesh

You don't need multithreading to upload files concurrently in NodeJS.

NodeJS is not C++ where I/O operations are synchronous by default.

All I/O operations are always async in NodeJS. There are some exceptions though, like synchronous I/O APIs but they are very bad for performance so don't use them.

NodeJS uses libuv to issue non-blocking I/O syscalls and to get I/O event notifications.

Create Workers in NodeJS, only if you are doing CPU bound operations like computing hash of a large ArrayBuffer, or a very simple example would be finding a prime number. These operations will put heavy load on the CPU thread and prevent other tasks from executing.

It is always better to do I/O based operations in an async event loop as OS threads/processes are very expensive. When doing I/O, most of the time, our application spends waiting for a Disk or a Network Device.

If you know Rust see Tokio and Rayon.

See here to learn how Nginx delivers high performance with Non Blocking I/O.

Collapse
 
wesleymreng7 profile image
Wesley Miranda

Makes sense! thanks for your comment!

This approach I used to show how to create worker threads, perhaps is not the best!

Collapse
 
joelbonetr profile image
JoelBonetR 🥇

Awesome insights!

Thanks a lot for sharing such concepts on a perfectly reasonable use-case, 10 out of 10 😁

Collapse
 
wesleymreng7 profile image
Wesley Miranda

Thanks a lot for your comment!

I am planning to write more contents to show the things under the hood.

Collapse
 
joelbonetr profile image
JoelBonetR 🥇

That would be amazing! I'm following you to read more about these topics as soon as you publish them 😄

Collapse
 
ricardo_borges profile image
Ricardo Borges

Nice article, about that conceptual part in the introduction, wouldn't child_processbe an approach for multiprocessing instead of multithreading?

Collapse
 
wesleymreng7 profile image
Wesley Miranda

Good point! Parallel processing should be the best word for that!

thanks for your comment!

Collapse
 
bias profile image
Tobias Nickel

i like the 'piscina' library for using multiple thread in node.js, however shared memory would still be awesome.

Collapse
 
wesleymreng7 profile image
Wesley Miranda

Nice! I know it. It was created by a brazilian guy I guess.

Some comments have been hidden by the post's author - find out more