Dhawal Singh Panwar

Posted on Aug 14, 2021

How many rows would have been there without that LIMIT clause?

#sql #database #mysql #postgres

You're familiar with the concept of modularity from OOPS, right? So, my introduction is not in this post. 😆

Article No Longer Available

Introduction
Explanation
- MySQL
- Postgres
Conclusion
References
Further Reads

Introduction

Let's say, you're building this software for an event management company to keep a track of all their events digitally. So, you have the front-end dev ready with the UI and all she needs from you is two things.

A list of all events that have already happened this year (10 rows at a time for the sake of pagination)
Total count of the same

Considering that there's an events table in your SQL Database, you could just get down to coding and provide two separate endpoints to accommodate her requirements.

SELECT id, organiser_id, venue_id, scheduled_date
FROM events
WHERE scheduled_date < NOW()
LIMIT 10 OFFSET 0;

The above query will serve you with the requirements of Task 1 in getting the desired details in a paginated manner given you provide a limit and an offset according to however many records the user wants to see on the GUI at a time.

SELECT COUNT(*) AS total
FROM events
WHERE scheduled_date < NOW();

And, this query will get you the total number of records there exist that match the provided condition.

This will suffice her requirements and also you'll have her impressed with how quickly you can implement this.

Explanation

Let's consider the following queries you'd write.

SELECT id, organiser_id, venue_id, scheduled_date
FROM events
WHERE scheduled_date < NOW()
LIMIT 10 OFFSET 0;

+------+--------------+----------+----------------+
| id   | organiser_id | venue_id | scheduled_date |
+------+--------------+----------+----------------+
|    4 |            6 |        2 | 2021-02-20     |
|    5 |            4 |        6 | 2021-05-29     |
|    6 |            4 |        7 | 2021-02-14     |
|    7 |            1 |        2 | 2021-06-23     |
|    8 |            2 |       10 | 2021-06-30     |
|   10 |            1 |       13 | 2021-08-01     |
|   11 |            5 |        6 | 2021-02-28     |
|   12 |            4 |        4 | 2021-07-07     |
|   19 |            3 |        5 | 2021-07-21     |
|   26 |            4 |        1 | 2021-02-12     |
+------+--------------+----------+----------------+
10 rows in set (0.01 sec)

SELECT COUNT(*) AS total
FROM events
WHERE scheduled_date < NOW();

+-------+
| total |
+-------+
|    54 |
+-------+
1 row in set (0.01 sec)

What'll happen in the first query is that the SQL Server will gather all the rows that satisfy the WHERE clause like in a one big superset and then return you the quantity mentioned in the LIMIT clause. This is completely fine.

But, where things go not so good for the server is when it'll run the second query. It'll again do the same steps as it did in the first query, make a superset and then count it for you.

So, it'll do the same thing, evaluate your whole query just to get you a total count? Isn't that overkill?

Well, how about another way of achieving the said tasks without burdening the SQL Server?

So, some SQL solutions provide you ways out of this. Let me show you how you could do this in an efficient manner for MySQL and Postgres.

MySQL

SQL_CALC_FOUND_ROWS is a query modifier or say its just a keyword like DISTINCT which you use in conjunction with the SELECT statement.

SELECT SQL_CALC_FOUND_ROWS id, organiser_id, venue_id, scheduled_date
FROM events
WHERE scheduled_date < NOW()
LIMIT 10 OFFSET 0;

+------+--------------+----------+----------------+
| id   | organiser_id | venue_id | scheduled_date |
+------+--------------+----------+----------------+
|    4 |            6 |        2 | 2021-02-20     |
|    5 |            4 |        6 | 2021-05-29     |
|    6 |            4 |        7 | 2021-02-14     |
|    7 |            1 |        2 | 2021-06-23     |
|    8 |            2 |       10 | 2021-06-30     |
|   10 |            1 |       13 | 2021-08-01     |
|   11 |            5 |        6 | 2021-02-28     |
|   12 |            4 |        4 | 2021-07-07     |
|   19 |            3 |        5 | 2021-07-21     |
|   26 |            4 |        1 | 2021-02-12     |
+------+--------------+----------+----------------+
10 rows in set, 1 warning (0.01 sec)

The results look the same. But, what'll happen here is that between making a superset and returning you the LIMITed records, SQL engine will count it all and store the count in the memory somewhere. Well, how does one access it?

SELECT FOUND_ROWS() AS total;

+-------+
| total |
+-------+
|    54 |
+-------+
1 row in set, 1 warning (0.00 sec)

Running the above query just after any query which has used the SQL_CALC_FOUND_ROWS query modifier will return the count it stored while processing the previous query. Thus, its quicker than evaluating the whole query again without the LIMIT clause. 😄

Although MySQL deprecated this feature due to performance issues, you can still use it till they actually remove it.

Postgres

For this particular case the initial queries won't change.

SELECT id, organiser_id, venue_id, scheduled_date
FROM events
WHERE scheduled_date < NOW()
LIMIT 10 OFFSET 0;

 id | organiser_id | venue_id | scheduled_date
----+--------------+----------+----------------
  4 |            6 |        2 | 2021-02-20
  5 |            4 |        6 | 2021-05-29
  6 |            4 |        7 | 2021-02-14
  7 |            1 |        2 | 2021-06-23
  8 |            2 |       10 | 2021-06-30
 10 |            1 |       13 | 2021-08-01
 11 |            5 |        6 | 2021-02-28
 12 |            4 |        4 | 2021-07-07
 19 |            3 |        5 | 2021-07-21
 26 |            4 |        1 | 2021-02-12
(10 rows)

SELECT COUNT(*) AS total
FROM events
WHERE scheduled_date < NOW();

 total
-------
    54
(1 row)

Same thing will happen here, as well. The WHERE clause would be required to get evaluated twice.

However, Postgres offers this concept of window functions which you could use to achieve both the tasks in a single query itself.

SELECT id, organiser_id, venue_id, scheduled_date, COUNT(*) OVER () AS total_count
FROM events
WHERE scheduled_date < NOW()
LIMIT 10 OFFSET 0;

 id | organiser_id | venue_id | scheduled_date | total_count
----+--------------+----------+----------------+-------------
  4 |            6 |        2 | 2021-02-20     |          54
  5 |            4 |        6 | 2021-05-29     |          54
  6 |            4 |        7 | 2021-02-14     |          54
  7 |            1 |        2 | 2021-06-23     |          54
  8 |            2 |       10 | 2021-06-30     |          54
 10 |            1 |       13 | 2021-08-01     |          54
 11 |            5 |        6 | 2021-02-28     |          54
 12 |            4 |        4 | 2021-07-07     |          54
 19 |            3 |        5 | 2021-07-21     |          54
 26 |            4 |        1 | 2021-02-12     |          54
(10 rows)

We have used the OVER clause over here which tells the engine to evaluate the preceding function over those rows that fall in to the same partition as the current row which we can mention in a following PARTITION BY clause just like how you'd pair up LIMIT and OFFSET clauses together.

We'll cover the PARTITION BY clause in a future post but in this case, the COUNT(*) function will be evaluated over all the rows as we have not provided any PARTITION BY clause to the ORDER clause like how LIMIT 10 OFFSET 10 returns different output than just LIMIT 10 without any OFFSET clause.

Thus, the COUNT(*) OVER () will run over all the rows that match the condition specified in the WHERE clause and be included in the superset which when further gets LIMITed you'll already have a dedicated column holding that total count for you. 😄

Conclusion

This isn't a write once use everywhere strategy as there will be cases where there are hardly any inserts between multiple paginated selects like this won't be efficient for when I keep refreshing my bank's app constantly when the month begins to see that salary credit record. If my banking institution uses this strategy, its a bad one (for now at least). 😛

For cases where there are or can be frequent inserts between multiple paginated selects like the bank refreshing my spends history, this is how you can optimize your database process and impress that front-end dev. 😉

DEV Community

How many rows would have been there without that LIMIT clause?

Article No Longer Available

Contents

Introduction

Explanation

MySQL

Postgres

Conclusion

References

Further Reads

Article No Longer Available

Top comments (0)