DEV Community

Cover image for The right way to make advanced and efficient MongoDB pagination
Maxim Vynohradov
Maxim Vynohradov

Posted on

The right way to make advanced and efficient MongoDB pagination

Onсe upon a time, we had a complex project enough (ride-sharing and taxi application) with stack Node.js and MongoDB. We have chosen this stack because it was preferable by the customer, good known by our team, and at the same time looks like good a suite for project tasks.

Everything was great, the number of users became more than twelve thousand, the number of active drivers was close to three hundred drivers. In one year, the number of rides becomes more than two million.

But once we need to create an admin panel to control and monitor all processes (from the business point of view) in the main application. The huge percent of requirements was to have advanced lists of a variety of entities, with bind statistics over them.

Because we use mongoose, as ODM, first of all, we take a look at its plugins. The most popular of them, that related to pagination are
mongoose-paginate
*Pagination plugin for Mongoose Note: This plugin will only work with Node.js >= 4.0 and Mongoose >= 4.0. Add plugin to…*www.npmjs.com

mongoose-paginate-v2
*A cursor based custom pagination library for Mongoose with customizable labels.*www.npmjs.com

mongoose-aggregate-paginate
*mongoose-aggregate-paginate is a Mongoose plugin easy to add pagination for aggregates. This plugin can be used in…*www.npmjs.com

mongoose-aggregate-paginate-v2
*A cursor based custom aggregate pagination library for Mongoose with customizable labels. If you are looking for basic…*www.npmjs.com

Another requirement was to have a possibility to choose a specific page on-demand, so the approach to use “previous-next”-like pagination, that based on cursors was prohibited immediately — its mongoose-paginate-v2 and *mongoose-aggregate-paginate-v2 *libraries.

The oldest, and probably the simplest in usage is mongoose-paginate — it uses simple search queries, limit, sort, and the skip operations. I guess it’s a good variant for simple pagination — just install a plugin, add few lines of code to your endpoint, and that’s all — work is done. It even can use “populate” of mongoose, — something that emulates joins from SQL world. Technically it just makes additional queries to the database, that probably not the way you want. Even more, when you just have a little bit more complicated query, with any data transformation, it will be totally unusable. I know just one way to normally use it in such cases — first create MongoDB View — technically its pre-saved queries, that MongoDB represents as read-only collections. And just then run pagination using mongoose-paginate over this view. Not bad — you will hide complicated queries under view, but we have a better idea of how to solve this problem.

MongoDB Aggregation Framework is here!

You know, I guess, it was really day for the MongoDB community when Aggregation Framework was released. Probably it allows for most of the queries that you can imagine. So, we think about taking mongoose-aggregate-paginate into use*.*

But the next two things that disappointed us:

**What does this plugin need? **I mean — what task does it help to solve, that cannot be solved without this plugin, with the same effort. Looks like it just one more additional dependency in your project, because it doesn’t bring any profit, even don’t save your time…

Internal codebase, and general approach to making queries. This library makes TWO calls into a database and waits for responses via Promise.all. First — to fetch query result and second — to calculate the count of total records that query returns, without $filter and $limit stages. It needs this to calculate the number of total pages.

How we can avoid additional queries into the database? The worst thing here that we need to run all aggregation pipeline twice, that can be costly enough in terms of memory and CPU usage. Even more, if collection huge, and documents tend to be few megabytes, it can impact Disc I/O usage, that also a big problem.

The good news — Aggregation Framework has a specific stage in its arsenal, that can solve this problem. It’s $facet:

Processes multiple aggregation pipelines within a single stage on the same set of input documents. Each sub-pipeline has its field in the output document where its results are stored as an array of documents.

MongoDB documentation about $facet stage .

Aggregation Pipeline for pagination will have the next shape:

{ 
  $facet: {
      <outputField1>: [ <stage1>, <stage2>, ... ],
      <outputField2>: [ <stage1>, <stage2>, ... ],
      ...

   }
}
Enter fullscreen mode Exit fullscreen mode

Also, pipelines for pagination can be improved by customization for specific cases. Some tips are listed below:

  • Run all operations, that don’t directly affect on final pagination result, after all possible filters ($match stages). There are stages like $project or $lookup — that don’t change the number or order of result documents. Try to cut off as many documents as you can at once.

  • Try to make your models as more self-sufficient, to avoid additional $lookups. It’s normal to duplicate some data or make pre-computing fields.

  • If you have a really huge pipeline, that processes many data, your query will probably use more than 100MB. In this case, you need to use allowDiskUse flag.

  • Follow Aggregation Pipelines performance optimization guide. This advice helps you to make your queries more efficient.

  • And technically — you can make dynamic queries on your application code side — depends on conditions you can add, remove or modify specific stages. It can speed up your queries, and moreover, make your code more eloquent.

Coz NDA, I cannot show you real database schema and real queries. But let me show you a small example of such pagination.

Imagine that you have two collections — Statistic and Drivers. Drivers collection is static enough in thinking of types and amount of fields in different documents. But Statistic is polymorphic, can be changed during time, as a result of business requirements updates. Also, some drivers could have big statistic documents and history in general. So you cannot make Statistic as subdocument of Driver.

So code and MongoDB query will have the next shape:

const ridesInfoPaginationPipeline = (filter = {}, skip = 0, limit = 10, sort = {}) => [{
    $match: {
      ...filter,
      active: true,
    }
  },
  {
    $sort: {
      ...sort,
      createdAt: -1,
    }
  },
  {
    $lookup: {
      from: 'statistic',
      localField: '_id',
      foreignField: 'driverId',
      as: 'driver',
    },
  },
  {
    $unwind: {
      path: '$driver',
      preserveNullAndEmptyArrays: true,
    },
  },
  {
    $project: {
      driver: {
        $ifNull: [{
          $concat: ['$driver.firstName', ' ', '$driver.lastName']
        }, 'Technical']
      },
      entityId: 1,
      message: 1,
      meta: 1,
      createdAt: 1,
    },
  },
  {
    $facet: {
      total: [{
        $count: 'createdAt'
      }],
      data: [{
        $addFields: {
          _id: '$_id'
        }
      }],
    },
  },
  {
    $unwind: '$total'
  },
  {
    $project: {
      data: {
        $slice: ['$data', skip, {
          $ifNull: [limit, '$total.createdAt']
        }]
      },
      meta: {
        total: '$total.createdAt',
        limit: {
          $literal: limit
        },
        page: {
          $literal: ((skip / limit) + 1)
        },
        pages: {
          $ceil: {
            $divide: ['$total.createdAt', limit]
          }
        },
      },
    },
  },
];



const executePagination = async () => {
    return Statistic.aggregate(ridesInfoPaginationPipeline());
}
Enter fullscreen mode Exit fullscreen mode

As you see, using Aggregation Framework and $facet stage we can:

  • make data transformation and complex queries;

  • fetch data from multiple collections;

  • get metadata of pagination (total, page, pages)in the one query without additional query execution.

Regarding the main drawbacks of such an approach, I guess just one is major — higher complicity of development and debug process, along with higher entry threshold. It includes performance troubleshooting, knowledge of a variety of stages, and data modeling approaches.


So, pagination, that based on MongoDB Aggregation Framework, is not pretending to be a silver bullet. But after many attempts and pitfalls — we found that this solution is covered all our cases, with no effects and no high coupling to a specific library.

Top comments (5)

Collapse
 
dhensen profile image
Dino Hensen

Oh I get the $unwind: '$total': it is because the field count that gets created has an array as value with a single integer element inside of it. Unwind creates a document foreach element in that array, but because there is only one it effectively pulls the single integer out of the array and uses that. Without this unwind the $divide errors about one value being an array.

Collapse
 
dhensen profile image
Dino Hensen

Can you please explain why you named the count field $count: 'createdAt'? It seems to me it might as well could be named foobar and still yield the same result or am I missing something?

I also do not understand that last $unwind: '$total'

This article has helped me a lot as somebody with minimum mongo experience in building a facet based pagination instead of skip-limit or keyset pagination.

Collapse
 
malinda1986 profile image
malinda1986 • Edited

I'm wondering if you change the sort key to another attribute does that work ? I think sort order breaks at

{ records: { '$slice': [ '$data', -10, { '$ifNull': [ 10, '$total.source' ] } ] }, pagination: { total: '$total.source', limit: { '$literal': 10 }, page: { '$literal': 1 }, pages: { '$ceil': { '$divide': [ '$total.source', 10 ] } } } }

upto above stage sort order preserved . any suggestions to fix that ?

Collapse
 
malinda1986 profile image
malinda1986

great article, best example I've seen for pagination with Aggregation Framework

Collapse
 
dimer191996 profile image
Dimer Bwimba

great article, link of my question