DEV Community

Medam Mahesh
Medam Mahesh

Posted on • Edited on

Optimizing Pagination: Range-Based Queries with Cursors and Happy Databases ๐Ÿš€

๐Ÿ‘‹ Introduction:

Pagination is used when you have a lot of records in your database collection. Since you can't just dump all the records on the screen, we give a bunch of the records and tell the client how to get the next bunch !!๐Ÿ”„

For the longest time, I had thought offset-based pagination was the way to do this (Haven't we all seen the page number section before ?).

Yahoo Search

Using Yahoo Search here cause Google disabled it

It comes naturally that when each page has 10 records and you click on Page 3, you have to skip the 30 records and return (31-40). If you translate the above to a DB query, it is as below.

db.article.aggregate([
    { "$skip": 3*10 },
    { "$limit": 10 }
]);
Enter fullscreen mode Exit fullscreen mode

Turns out MongoDB is not optimized for that sort of query. As the page number keeps increasing (say 2,000), we will keep skipping more records (skipping 2,000*10 = 20,000 records). For each page, MongoDB scans the result set based on your filter and starts skipping the records (after skipping 20,000 records) to get to the records in your final result. The time taken for this process increases with the number of records skipped.๐Ÿ“ˆ

On a MongoDB collection containing 100,000 records, the time taken by offset pagination looks as below:
Latency with the increase in offsets

Time taken (in ยตs) vs Records skipped [Code]

๐Ÿ” Range-based Pagination:

If you look at what all the big-wigs are doing, they seem to be using cursors, and they seem to be not facing this performance issue at all. If you look deeply, they are more or less using Range-based pagination.
Slack API Twitter API Facebook API Disqus API

๐Ÿค” Cursor, what's that?

A cursor is nothing but an identifier to fetch your next set of results. The client should not care what it means. In offset-based pagination, the offset(or page number) can be called your cursor. Don't confuse it with MongoDB cursors.

An ideal cursor for Range-based pagination has to be:

  • Unique and
  • You are OK with sorting on it

Keep in mind that you also have to create an appropriate index on this cursor field for the magic to happen.๐Ÿ˜œ
If you do not have such a field in your collection schema, you can still use range-based pagination (detailed at the end).

Range-based pagination is not some out-of-the-world algorithm that you need to rack your brains out. It is just a small twist on your normal thought process to take advantage of the database indexes. Let's take a deeper look.

๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป Querying Pattern:

Let's take the cursor as id which is an actual field on your database. Send this to the client so that they can query for the next set of results. You can choose to hide what your cursor means by having an encryption layer before sending it to the client.

Querying first time:

db.client.aggregate([{
  "$limit": 10,
},
{
  "$sort": {
      "id": 1,
  }
}])
Enter fullscreen mode Exit fullscreen mode

Query Nth time
Use the value of id from the last document in the previous result.

db.client.aggregate([
  {
    "$match": {
       "id":{
          "$gt": "<Last value of id>",
       }
    },
  },
  {
    "$sort": {
       "id": 1,
    }
  },
  {
    "$limit": 10,
  }
])
Enter fullscreen mode Exit fullscreen mode

What makes range-based pagination more scalable is that the query pattern can be indexed, thus reducing the result set and skipping the latency-introducing $skip step.

๐Ÿ™ˆ What if you don't have the ideal cursor? Make the not-so-ideal one, I guess.

You almost always have the ideal cursor (i.e. a primary key) on your collection. For some reason, you can't use it. Maybe because you want to provide options to your client on what they can sort by.
In that case, you can make your cursor a compound field by using the ideal cursor field (say id field) as the secondary sort and primary sort on the client's requested sort (title field).
The query pattern becomes like below:

db.client.aggregate([
  {
     "$match": {
       "$or": [
         {
           "id": { "$gt": "<Last value of id>"},
           "title": { "$eq": "<Last value of title>"}
         },
         {
           "title": { "$gt": "<Last value of title>"}
         }
       ]
     }
  },
  {
     "$sort": {
       "title":1,
       "id":1,
     }
  }
])
Enter fullscreen mode Exit fullscreen mode

Note: The promise of latency reduction here may change on a case-to-case basis when you are making cursor from multiple fields.


๐Ÿ“Š Comparison with Offset-based pagination

Offset works best for the first few pages but time-taken increases as the offset gets bigger whereas range-based pagination with indexes keeps the time taken almost constant.
Offset vs. Range-based Pagination

Time taken as we iterate through database [Code]

TLDR: Range-based pagination takes advantage of database indexes to keep the performance from deteriorating even when iterating through the complete collection ๐Ÿ†.

It may not be for you if:

  • You need offset-based random access: With range-based pagination, we do not have a concept of pages and total count as we are always fetching next to the cursor. If random access is a strong design requirement for you, then range-based pagination may not be for you.
  • You have complex sorting and uniqueness requirements: You have a lot of fields you want to sort on and you do not wish to create an index on all of them (or if the indexing does not work as expected ๐Ÿ˜Ÿ).
  • You do not need far-away records: Your client audience does not really require you to get far-away records. Maybe you provide a search as a feature and thus only the first few pages are enough. Like when have we visited the 2nd page of Google Search Result? While this is no reason to not implement cursor-based pagination, you can say I do not really have to ๐Ÿซฃ.

Top comments (0)