loading...

GraphQL Pagination

jefferyhus profile image El Housseine Jaafari ・5 min read

Hey there! Glad you are here and welcome to GraphQL Pagination tutorial. The goal of this tutorial is to guide you through setting up your pagination using graphql.

In graphql there are two major approaches, the first one is offset style and the second one goes by cursor style.

As a Ninja once said "行きましょ" 🙈.

Offset Pagination

Now that you have your blog up and running, you notice that you are querying all your posts. This is not efficient, especially if you go over a 100 blog posts. Your SQL query will grow by time and spend then more time to execute.

To solve the problem, by default as someone who knows SQL well, you will probably think of pagination. The way you do it is by adding two arguments to your SQL query offset1 and limit2. You are asking right now about how to achieve this in your graphql server.

Allow me to show you an example of fetching 10 blog posts from all your posts starting from the 11th one. Your query will look like this:

query {
  posts(limit: 10, offset: 10) {
    title,
    content
  }
}

Until now everything seems to be working, this type of pagination works great if you have either a static or small data. It results into a good user experience for quickly fetching the next page data.

But, this approach have its downsides and issues that needs to be addressed.

Performance

As mentioned before, in small datasets, the offset approach is good for you and will not break the user experience.

But as the data grows in your dataset and you are trying to fetch the farthest of the results 😩 it will take longer than you can imagine (You may 😢).

But how? (you ask). Let's break it down easily, take this query as an example:

query {
  posts(limit: 10, offset: 10000) {
    title,
    content
  }
}

The query is fetching 10 posts with an offset of 10K, in a first glance you think that's it, your database engine will directly understand the offset and start querying from that row.

Well! That won't happen because the way a relational database engine works, is that it will still have to compute all the skipped rows inside the server; therefore a large OFFSET might be inefficient.

You see?! It's bad right, just like if you want to start reading a book from a certain page but you still have to go over every page and count it yourself without the use of an index.

Don't worry, we got your back with a better solution 😎

Cursor pagination

This approach tend to set a pointer to a specific record in your datatable. For it to work at its finest, the cursor must be unique and sequential. What this does, is that you will always fetch data after a certain row instead of relying in the positioning of rows inside your table.

Let's have an example to understand this more:

query {
  posts(limit: 10, publisherId: 1, cursor: 1) {
    title,
    content
  }
}

The query will skip all the 24th rows and returns the next 10 results in your dataset after the value of your cursor. The cursor here refers to the id of our posts.

Looking at your resolver, it will look like this:

P.S: We are using Sequelize as our ORM framework.

{
  async posts(parent, { limit, publisherId, cursor }) {
    return await Posts.findAll({
      where: {
        publisherId,
        id: {
          [Op.gt]: cursor
        }
      },
      limit
    })
  }
}

Cursor based pagination is more performant than offsets because we can leverage database indexes on the columns in the where clause that are being used as cursors.

There is more to this solution, and many libraries and frameworks out there are presenting you with their own cursor style pagination; But we can't go over each of if, we will stick to the one that is most used with React.

Relay style pagination

Relay is a framework for building data driven React apps. Relay provides a standard mechanism to slice data and paginate the results.

query {
  posts(first: 10, after: "cursor") {
    edges {
      cursor
      node {
        id
        title
        content
      }
    }
    pageInfo {
      hasNextPage
    }
  }
}

Here, slicing is done using the first argument. This is similar to the limit which we had applied in the earlier queries. Pagination is done using the after argument, which will pass a cursor expecting results to be after that cursor. It also asks for hasNextPage which tells the client whether there are more results, since there is no concept of total number of pages in Cursor pagination.

Recap

We learned two different pagination styles, and walked trough both of it. To recap our tutorial, let's list again what each approach offers.

Offset

Pros

  • It gives you the ability to see the total number of pages and their progress through that total.
  • It gives you the ability to jump to a specific page within the set.
  • It’s easy to implement as long as there is an explicit ordering of the results from a query.

Cons

  • Using LIMIT OFFSET doesn’t scale well for large datasets. As the offset increases the farther you go within the dataset, the database still has to read up to offset + count rows from disk, before discarding the offset and only returning count rows.
  • If items are being written to the dataset at a high frequency, the page window becomes unreliable, potentially skipping or returning duplicate results.

Cursor

Pros

  • This will scale well for large datasets. We’re using a WHERE clause to fetch rows with index values less than the last index from the previous page. This lets us leverage the index on the column and the database doesn’t have to read any rows that we’ve already seen. We’re also not returning the total number of pages or items in the set, so we avoid having to calculate the full result set on each request.
  • The pagination process is stabilized. Instead of calculating from scratch on each request based on the total number of items, we’re always fetching the next count rows after a specific reference point. If items are being written to the dataset at a high frequency, the overall position of the cursor in the set might change, but the pagination window adjusts accordingly.

Cons

  • The cursor must be based on a unique, sequential column (or columns) in the source table.
  • There is no concept of the total number of pages or results in the set.
  • The client can’t jump to a specific page.

P.S.: This image shows the differences between both approaches.

cursor vs offset

Dictionary

  • (1): OFFSET says to skip that many rows before beginning to return rows.
  • (2): LIMIT is an optional clause of the SELECT statement that returns a subset of rows returned by the query.

With that said, I hope you enjoyed reading the article and understood the differences between both approaches 😄

Brought to you with ❤️

=====================

Github repo: https://github.com/JefferyHus/graphql-pagination

Posted on Feb 18 by:

jefferyhus profile

El Housseine Jaafari

@jefferyhus

Lead software engineer 🃏 and hosting #Pentium podcast 🎙️

Discussion

markdown guide
 

Great article El Housseine!

How would the resolver's implementation be if the ids are not in sequence (ordered)?

 

Thanks for your comments and feedback :)

Your question is a good one, in the article I mention in the Cons part that using the cursor pagination style can not be achieved if you don't have a unique & sequential key.

Most of cases we have createdAt which can be used as a cursor key in case your primary key is of UUIDv4 or similar.

 

I see, makes sense!

The issue is that createdAt might not be unique so we could have items on a page re-displayed on the next page.

We could probably use something like a db row_index function (the index of the items in a query result) or (if not supported by the db engine) still use the id and get the X next values after the row that has that ID (probably not a simple query though).

That would complicate things lot. You can go for a timestamp value in your createdAt for example, I don't believe there will be more than one row inserted in the same microsecond, isn't it right? 🤔

Regarding timestamps, that would be a good solution indeed:

  • if the application does not have a lot of traffic that might introduce timestamp collisions (which is the case of my crappy apps ;))
  • if you already have a timestamp in your database table/collection or have the ability to add one (think about read-only legacy databases)
  • to avoid collisions on timestamps and be able to query next elements without duplicates, timestamps would need to be created in sequence with a random suffix based on time, not sure that this is always provided OOB by db engines.

Actually, if supported, row index is pretty straightforward, you simply add it to your select and you get an index of each row in the result, that would be the cursor value I guess.
Example here for postgresql: postgresqltutorial.com/postgresql-...
Do you see any issue on using ROW_NUMBER()?

PS: Just rambling on here, I'll probably use timestamps on my apps, as you've suggested ;)

The only issue I see with using ROW_NUMBER is that you will have to always select all results from a table, assign a row number then only pull the results you want. This is exactly how limit & offset works. So it isn't going to be of any help if your database is growing fast.

As I said if you include microseconds in your timestamp value, there mostly will be no collision.

You're right! You'd just have to have these micro-timestamps in your database and have end-users tolerant to potential (even rare) collisions otherwise I guess you'd have to deal with less performant fetching strategies.