DEV Community

loading...

Don't ever use LIMIT/FETCH OFFSET/SKIP again!!!

golkhandani profile image Mohammad Reza Rahimian Golkhandani ・5 min read

Introduction

Working with LIMIT and OFFSET is one of the easiest way to create a paginated query. However it is not very performance friendly.
A better approach for a paginated query is using "cursor based" approach. In this article I have tried to compare both "cursor based" pagination and "offset based" to give a better understanding of them and find the best approach that fit our project.

What is the "Cursor based" pagination?

In this method instead of using OFFSET OR SKIP to scroll data to the last retrieved data a pointer to the last item is used for example user_id.
Usually this pointer is used as a condition in our query, like $gte in MongoDB or > in SQL;
In the following sections of this article, I will implement "cursor based" method in a table with 20000 row (MySQL).

To keep reading time short I will implement Mongodb version in separate article

Preparing test in MySQL

To do a comparison between different methods I will create 20000 row in a table.
Let's start wih creating database for our test.

CREATE DATABASE benchmark_pagination;
USE benchmark_pagination;

now create a table with 20000 row

CREATE TABLE users (
    user_id BINARY(16) PRIMARY KEY DEFAULT (UUID_TO_BIN(UUID())),
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    birth_date DATE,
    phone VARCHAR(50),
    password VARCHAR(50),
    INDEX (phone)
);
INSERT INTO users (first_name, last_name, birth_date, phone, password) values ('Hatty', 'Massenhove', '1998-06-19', '268-515-7621', 'mJOGJ36lAO');
.
.
.

Doing a simple select query for this table will result:

I use SQL_NO_CACHE to disable query cache layer for testing

SELECT SQL_NO_CACHE  BIN_TO_UUID(user_id) AS user_id,first_name,last_name
FROM users
ORDER BY user_id ASC
LIMIT 10 OFFSET 10000

STATE TOTAL_TIME TIME_PERCENT CALLS TIME
Executing 57.2 ms 98.53% 1 57.2 ms
6a48c2b8-ba91-11ea-9d0e-0242ac130004    Hatty       Massenhove
6a4ef217-ba91-11ea-9d0e-0242ac130004    Inga        Scoyles
6a536ef0-ba91-11ea-9d0e-0242ac130004    Kenyon      Della Scala
6a57c3bc-ba91-11ea-9d0e-0242ac130004    Devondra    Gosselin
6a5bf956-ba91-11ea-9d0e-0242ac130004    Corney      Perring     *
6a611955-ba91-11ea-9d0e-0242ac130004    Shina       Billo
6a65396f-ba91-11ea-9d0e-0242ac130004    Opalina     Kienle
6a6a08ad-ba91-11ea-9d0e-0242ac130004    Riobard     Godsmark
6a702fbe-ba91-11ea-9d0e-0242ac130004    Emalee      Ferrarini
6a748fba-ba91-11ea-9d0e-0242ac130004    Netta       Bellworthy

Look at above results as the base result for next two queries. First with LIMIT and OFFSET and second with Cursor Based method.

We use row with * as the last item for better understanding and comparison
First query and results:

SELECT SQL_NO_CACHE BIN_TO_UUID(user_id) AS user_id,first_name,last_name
FROM users
ORDER BY user_id ASC
LIMIT 10 OFFSET 10005

STATE TOTAL_TIME TIME_PERCENT CALLS TIME
Executing 55.1 ms 99.47% 1 55.1 ms
6a611955-ba91-11ea-9d0e-0242ac130004    Shina       Billo
6a65396f-ba91-11ea-9d0e-0242ac130004    Opalina     Kienle
6a6a08ad-ba91-11ea-9d0e-0242ac130004    Riobard     Godsmark
6a702fbe-ba91-11ea-9d0e-0242ac130004    Emalee      Ferrarini
6a748fba-ba91-11ea-9d0e-0242ac130004    Netta       Bellworthy
6a78f019-ba91-11ea-9d0e-0242ac130004    Cornelia    Perocci
6a7f6f72-ba91-11ea-9d0e-0242ac130004    Isabeau     Griss
6a83a96e-ba91-11ea-9d0e-0242ac130004    Hubie       Dawkes
6a87e9ea-ba91-11ea-9d0e-0242ac130004    Tabor       Royson
6a8dfb29-ba91-11ea-9d0e-0242ac130004    Izaak       Lombardo

Second query and results:

SELECT SQL_NO_CACHE BIN_TO_UUID(user_id) AS user_id,first_name,last_name
FROM users
WHERE user_id > UUID_TO_BIN('6a5bf956-ba91-11ea-9d0e-0242ac130004')
ORDER BY user_id ASC
LIMIT 10
STATE TOTAL_TIME TIME_PERCENT CALLS TIME
Executing 7.7 ms 86.66% 1 7.7 ms

Things to consider (Unique field)!

It is obvious that using cursor to find last item is significantly better in term of response time but there is a problem, what should I do if I had to ORDER BY first_name. Actually to do cursor baser pagination you should use a unique field for sorting, In our case user_id is a unique field so to do a select query with pagination with ORDER BY first_name we can do as follow:

Again a simple select query:

SELECT SQL_NO_CACHE BIN_TO_UUID(user_id) AS user_id,first_name,last_name
FROM users
ORDER BY first_name ASC, user_id ASC
LIMIT 10 OFFSET 10000
STATE TOTAL_TIME TIME_PERCENT CALLS TIME
Executing 59.7 ms 99.49% 1 59.7 ms
288fd856-ba91-11ea-9d0e-0242ac130004    Jammie  Quigley
308ea632-ba91-11ea-9d0e-0242ac130004    Jammie  Petrelli
5290078b-ba91-11ea-9d0e-0242ac130004    Jammie  Quigley
57c98456-ba91-11ea-9d0e-0242ac130004    Jammie  Petrelli
71daa45d-ba91-11ea-9d0e-0242ac130004    Jammie  Quigley     *
78f70797-ba91-11ea-9d0e-0242ac130004    Jammie  Petrelli
8b085688-ba91-11ea-9d0e-0242ac130004    Jammie  Quigley
90709ac7-ba91-11ea-9d0e-0242ac130004    Jammie  Petrelli
3955aab9-ba91-11ea-9d0e-0242ac130004    Jan     Semechik
6301bd18-ba91-11ea-9d0e-0242ac130004    Jan     Semechik

Just like before we use this query's results as the base result for next two queries. First with LIMIT and OFFSET and second with Cursor Based method.

First query and results:

SELECT SQL_NO_CACHE BIN_TO_UUID(user_id) AS user_id,first_name,last_name
FROM users
ORDER BY first_name ASC, user_id ASC
LIMIT 10 OFFSET 10005
STATE TOTAL_TIME TIME_PERCENT CALLS TIME
Executing 57 ms 99.39% 1 57 ms
78f70797-ba91-11ea-9d0e-0242ac130004    Jammie  Petrelli
8b085688-ba91-11ea-9d0e-0242ac130004    Jammie  Quigley
90709ac7-ba91-11ea-9d0e-0242ac130004    Jammie  Petrelli
3955aab9-ba91-11ea-9d0e-0242ac130004    Jan     Semechik
6301bd18-ba91-11ea-9d0e-0242ac130004    Jan     Semechik
8048650c-ba91-11ea-9d0e-0242ac130004    Jan     Semechik
981ed0dc-ba91-11ea-9d0e-0242ac130004    Jan     Semechik
25e93d81-ba91-11ea-9d0e-0242ac130004    Jana    Whitlow
342acad9-ba91-11ea-9d0e-0242ac130004    Jana    Aronstein
4e2e9f89-ba91-11ea-9d0e-0242ac130004    Jana    Whitlow

Second query and results:

SELECT SQL_NO_CACHE BIN_TO_UUID(user_id) AS user_id,first_name,last_name
FROM users
WHERE
first_name >= 'Jammie'
    AND (
        user_id > UUID_TO_BIN('71daa45d-ba91-11ea-9d0e-0242ac130004')
        OR
        first_name > 'Jammie'
    )
ORDER BY first_name ASC, user_id ASC
LIMIT 10
STATE TOTAL_TIME TIME_PERCENT CALLS TIME
Executing 10.7 ms 97.47% 1 10.7 ms
78f70797-ba91-11ea-9d0e-0242ac130004    Jammie  Petrelli
8b085688-ba91-11ea-9d0e-0242ac130004    Jammie  Quigley
90709ac7-ba91-11ea-9d0e-0242ac130004    Jammie  Petrelli
3955aab9-ba91-11ea-9d0e-0242ac130004    Jan     Semechik
6301bd18-ba91-11ea-9d0e-0242ac130004    Jan     Semechik
8048650c-ba91-11ea-9d0e-0242ac130004    Jan     Semechik
981ed0dc-ba91-11ea-9d0e-0242ac130004    Jan     Semechik
25e93d81-ba91-11ea-9d0e-0242ac130004    Jana    Whitlow
342acad9-ba91-11ea-9d0e-0242ac130004    Jana    Aronstein
4e2e9f89-ba91-11ea-9d0e-0242ac130004    Jana    Whitlow

Note

So basically using cursor based pagination in many sceneries is much faster than LIMIT/OFFSET pagination, but it is not good for all circumstances.
Some time the number of field that you have to sort is big that makes cursor query complex and some time your table has a few row and has a small growth rate.
In these cases using simple codes are good enough.

Conclusion

This article supposed to be a notice that there are better ways to do simple things with better performance. In real project you may face with some serious data or giant tables so it's good to search before using those simple ways.
I hope this article could help people like me to learn about new topic;

Discussion (0)

pic
Editor guide