DEV Community

Cover image for Speeding up MongoDB migrations with cursors and bulkWrite
Sibelius Seraphini for Woovi

Posted on

Speeding up MongoDB migrations with cursors and bulkWrite

Data Migrations at Scale

As Woovi grows we have to deal with more and more data in our database collections.
We are constantly evolving our product to support new use cases for our customers.
We also need to evolve our collections to support these changes.
Data migrations let us migrate older data to avoid having to deal with two data formats in our codebase.

Reading data in batch with cursor

The naive way of reading data is like this:

const users = await User.find()
Enter fullscreen mode Exit fullscreen mode

This will read all users from the database to your memory.
This is fast and works well if you have a few users.
However, if you have millions of users, this will be slow and consume a lot of memory.

You can use the cursor to fetch one item at a time, like this

const cursor = User.find();

for await (const doc of cursor) {
   await migrateItem(doc);
}
Enter fullscreen mode Exit fullscreen mode

This approach is better, but you do one network request for each read. If you have 1 million users, you are going to do 1 million requests to the database.

You can improve this using the batchSize option in the cursor

const cursor = User.find().cursor({ batchSize: 10000 });

const batched = batchCursor(batchedCursor, batchSize);

while (true) {
  const { value, done } = await batched.next();

  for (const doc of value) {
    await migrateItem(doc);
  }

  if (done) {
    break;
  }
}
Enter fullscreen mode Exit fullscreen mode

batchCursor helper definition:

export async function* batchCursor(c, n) {
  const cursor = c;

  while (true) {
    const ret = [];
    let i = 0;

    while (i < n) {
      const val = await cursor.next();

      if (val) {
        ret.push(val);
      } else {
        return ret;
      }

      i++;
    }

    yield ret;
  }
}
Enter fullscreen mode Exit fullscreen mode

This reduces from 1 million requests to the database to only 100.

Writing in batch

Even if you reduce the number of database reads, your data migrations will still be slow if you don't reduce the number of database writes.

MongoDB provides a bulkWrite API that enables you to batch writes in a single database request.

const writes = docs.map((doc) => getWriteOperation(doc));

await User.bulkWrite(writes);
Enter fullscreen mode Exit fullscreen mode

Instead of each item making its own write to the database, they will return a write operation that will be joined to be sent in bulk to the database.

Below is an example of a bulkWrite operation using updateOne, this will update the User with _id as the user._id, with the field emails as the user.email.

{
    updateOne: {
      filter: { _id: user._id },
      update: {
        $set: {
          emails: [user.email],
        },
      },
    },
  };
Enter fullscreen mode Exit fullscreen mode

To Sum Up

All these improvements are not only to make data migrations run faster but also to reduce the workload for your database.

These improvements also made our migration code better and more intuitive.


Woovi
Woovi is a Startup that enables shoppers to pay as they like. Woovi provides instant payment solutions for merchants to accept orders to make this possible.

If you want to work with us, we are hiring!


Photo by Sigmund on Unsplash

Top comments (0)