๐ Introduction:
Pagination is used when you have a lot of records in your database collection. Since you can't just dump all the records on the screen, we give a bunch of the records and tell the client how to get the next bunch !!๐
For the longest time, I had thought offset-based pagination was the way to do this (Haven't we all seen the page number section before ?).
It comes naturally that when each page has 10 records and you click on Page 3, you have to skip the 30 records and return (31-40). If you translate the above to a DB query, it is as below.
db.article.aggregate([
{ "$skip": 3*10 },
{ "$limit": 10 }
]);
Turns out MongoDB is not optimized for that sort of query. As the page number keeps increasing (say 2,000), we will keep skipping more records (skipping 2,000*10 = 20,000 records). For each page, MongoDB scans the result set based on your filter and starts skipping the records (after skipping 20,000 records) to get to the records in your final result. The time taken for this process increases with the number of records skipped.๐
๐ Range-based Pagination:
If you look at what all the big-wigs are doing, they seem to be using cursors, and they seem to be not facing this performance issue at all. If you look deeply, they are more or less using Range-based pagination.
Slack API Twitter API Facebook API Disqus API
๐ค Cursor, what's that?
A cursor is nothing but an identifier to fetch your next set of results. The client should not care what it means. In offset-based pagination, the offset(or page number) can be called your cursor. Don't confuse it with MongoDB cursors.
An ideal cursor for Range-based pagination has to be:
- Unique and
- You are OK with sorting on it
Keep in mind that you also have to create an appropriate index on this cursor field for the magic to happen.๐
If you do not have such a field in your collection schema, you can still use range-based pagination (detailed at the end).
Range-based pagination is not some out-of-the-world algorithm that you need to rack your brains out. It is just a small twist on your normal thought process to take advantage of the database indexes. Let's take a deeper look.
๐จ๐ปโ๐ป Querying Pattern:
Let's take the cursor as id
which is an actual field on your database. Send this to the client so that they can query for the next set of results. You can choose to hide what your cursor means by having an encryption layer before sending it to the client.
Querying first time:
db.client.aggregate([{
"$limit": 10,
},
{
"$sort": {
"id": 1,
}
}])
Query Nth time
Use the value of id
from the last document in the previous result.
db.client.aggregate([
{
"$match": {
"id":{
"$gt": "<Last value of id>",
}
},
},
{
"$sort": {
"id": 1,
}
},
{
"$limit": 10,
}
])
What makes range-based pagination more scalable is that the query pattern can be indexed, thus reducing the result set and skipping the latency-introducing $skip
step.
๐ What if you don't have the ideal cursor? Make the not-so-ideal one, I guess.
You almost always have the ideal cursor (i.e. a primary key) on your collection. For some reason, you can't use it. Maybe because you want to provide options to your client on what they can sort by.
In that case, you can make your cursor a compound field by using the ideal cursor field (say id
field) as the secondary sort and primary sort on the client's requested sort (title
field).
The query pattern becomes like below:
db.client.aggregate([
{
"$match": {
"$or": [
{
"id": { "$gt": "<Last value of id>"},
"title": { "$eq": "<Last value of title>"}
},
{
"title": { "$gt": "<Last value of title>"}
}
]
}
},
{
"$sort": {
"title":1,
"id":1,
}
}
])
Note: The promise of latency reduction here may change on a case-to-case basis when you are making cursor from multiple fields.
๐ Comparison with Offset-based pagination
TLDR: Range-based pagination takes advantage of database indexes to keep the performance from deteriorating even when iterating through the complete collection ๐.
It may not be for you if:
- You need offset-based random access: With range-based pagination, we do not have a concept of pages and total count as we are always fetching next to the cursor. If random access is a strong design requirement for you, then range-based pagination may not be for you.
- You have complex sorting and uniqueness requirements: You have a lot of fields you want to sort on and you do not wish to create an index on all of them (or if the indexing does not work as expected ๐).
- You do not need far-away records: Your client audience does not really require you to get far-away records. Maybe you provide a search as a feature and thus only the first few pages are enough. Like when have we visited the 2nd page of Google Search Result? While this is no reason to not implement cursor-based pagination, you can say I do not really have to ๐ซฃ.
Top comments (0)