You're familiar with the concept of modularity from OOPS, right? So, my introduction is not in this post. 😆
Article No Longer Available
Contents
Introduction
Let's say, you're building this software for an event management company to keep a track of all their events digitally. So, you have the front-end dev ready with the UI and all she needs from you is two things.
- A list of all events that have already happened this year (10 rows at a time for the sake of pagination)
- Total count of the same
Considering that there's an events
table in your SQL Database, you could just get down to coding and provide two separate endpoints to accommodate her requirements.
SELECT id, organiser_id, venue_id, scheduled_date
FROM events
WHERE scheduled_date < NOW()
LIMIT 10 OFFSET 0;
The above query will serve you with the requirements of Task 1 in getting the desired details in a paginated manner given you provide a limit and an offset according to however many records the user wants to see on the GUI at a time.
SELECT COUNT(*) AS total
FROM events
WHERE scheduled_date < NOW();
And, this query will get you the total number of records there exist that match the provided condition.
This will suffice her requirements and also you'll have her impressed with how quickly you can implement this.
Explanation
Let's consider the following queries you'd write.
SELECT id, organiser_id, venue_id, scheduled_date
FROM events
WHERE scheduled_date < NOW()
LIMIT 10 OFFSET 0;
+------+--------------+----------+----------------+
| id | organiser_id | venue_id | scheduled_date |
+------+--------------+----------+----------------+
| 4 | 6 | 2 | 2021-02-20 |
| 5 | 4 | 6 | 2021-05-29 |
| 6 | 4 | 7 | 2021-02-14 |
| 7 | 1 | 2 | 2021-06-23 |
| 8 | 2 | 10 | 2021-06-30 |
| 10 | 1 | 13 | 2021-08-01 |
| 11 | 5 | 6 | 2021-02-28 |
| 12 | 4 | 4 | 2021-07-07 |
| 19 | 3 | 5 | 2021-07-21 |
| 26 | 4 | 1 | 2021-02-12 |
+------+--------------+----------+----------------+
10 rows in set (0.01 sec)
SELECT COUNT(*) AS total
FROM events
WHERE scheduled_date < NOW();
+-------+
| total |
+-------+
| 54 |
+-------+
1 row in set (0.01 sec)
What'll happen in the first query is that the SQL Server will gather all the rows that satisfy the WHERE
clause like in a one big superset and then return you the quantity mentioned in the LIMIT
clause. This is completely fine.
But, where things go not so good for the server is when it'll run the second query. It'll again do the same steps as it did in the first query, make a superset and then count it for you.
So, it'll do the same thing, evaluate your whole query just to get you a total count? Isn't that overkill?
Well, how about another way of achieving the said tasks without burdening the SQL Server?
So, some SQL solutions provide you ways out of this. Let me show you how you could do this in an efficient manner for MySQL and Postgres.
MySQL
SQL_CALC_FOUND_ROWS
is a query modifier or say its just a keyword like DISTINCT
which you use in conjunction with the SELECT
statement.
SELECT SQL_CALC_FOUND_ROWS id, organiser_id, venue_id, scheduled_date
FROM events
WHERE scheduled_date < NOW()
LIMIT 10 OFFSET 0;
+------+--------------+----------+----------------+
| id | organiser_id | venue_id | scheduled_date |
+------+--------------+----------+----------------+
| 4 | 6 | 2 | 2021-02-20 |
| 5 | 4 | 6 | 2021-05-29 |
| 6 | 4 | 7 | 2021-02-14 |
| 7 | 1 | 2 | 2021-06-23 |
| 8 | 2 | 10 | 2021-06-30 |
| 10 | 1 | 13 | 2021-08-01 |
| 11 | 5 | 6 | 2021-02-28 |
| 12 | 4 | 4 | 2021-07-07 |
| 19 | 3 | 5 | 2021-07-21 |
| 26 | 4 | 1 | 2021-02-12 |
+------+--------------+----------+----------------+
10 rows in set, 1 warning (0.01 sec)
The results look the same. But, what'll happen here is that between making a superset and returning you the LIMIT
ed records, SQL engine will count it all and store the count in the memory somewhere. Well, how does one access it?
SELECT FOUND_ROWS() AS total;
+-------+
| total |
+-------+
| 54 |
+-------+
1 row in set, 1 warning (0.00 sec)
Running the above query just after any query which has used the SQL_CALC_FOUND_ROWS
query modifier will return the count it stored while processing the previous query. Thus, its quicker than evaluating the whole query again without the LIMIT
clause. 😄
Although MySQL deprecated this feature due to performance issues, you can still use it till they actually remove it.
Postgres
For this particular case the initial queries won't change.
SELECT id, organiser_id, venue_id, scheduled_date
FROM events
WHERE scheduled_date < NOW()
LIMIT 10 OFFSET 0;
id | organiser_id | venue_id | scheduled_date
----+--------------+----------+----------------
4 | 6 | 2 | 2021-02-20
5 | 4 | 6 | 2021-05-29
6 | 4 | 7 | 2021-02-14
7 | 1 | 2 | 2021-06-23
8 | 2 | 10 | 2021-06-30
10 | 1 | 13 | 2021-08-01
11 | 5 | 6 | 2021-02-28
12 | 4 | 4 | 2021-07-07
19 | 3 | 5 | 2021-07-21
26 | 4 | 1 | 2021-02-12
(10 rows)
SELECT COUNT(*) AS total
FROM events
WHERE scheduled_date < NOW();
total
-------
54
(1 row)
Same thing will happen here, as well. The WHERE
clause would be required to get evaluated twice.
However, Postgres offers this concept of window functions which you could use to achieve both the tasks in a single query itself.
SELECT id, organiser_id, venue_id, scheduled_date, COUNT(*) OVER () AS total_count
FROM events
WHERE scheduled_date < NOW()
LIMIT 10 OFFSET 0;
id | organiser_id | venue_id | scheduled_date | total_count
----+--------------+----------+----------------+-------------
4 | 6 | 2 | 2021-02-20 | 54
5 | 4 | 6 | 2021-05-29 | 54
6 | 4 | 7 | 2021-02-14 | 54
7 | 1 | 2 | 2021-06-23 | 54
8 | 2 | 10 | 2021-06-30 | 54
10 | 1 | 13 | 2021-08-01 | 54
11 | 5 | 6 | 2021-02-28 | 54
12 | 4 | 4 | 2021-07-07 | 54
19 | 3 | 5 | 2021-07-21 | 54
26 | 4 | 1 | 2021-02-12 | 54
(10 rows)
We have used the OVER
clause over here which tells the engine to evaluate the preceding function over those rows that fall in to the same partition as the current row which we can mention in a following PARTITION BY
clause just like how you'd pair up LIMIT
and OFFSET
clauses together.
We'll cover the PARTITION BY
clause in a future post but in this case, the COUNT(*)
function will be evaluated over all the rows as we have not provided any PARTITION BY
clause to the ORDER
clause like how LIMIT 10 OFFSET 10
returns different output than just LIMIT 10
without any OFFSET
clause.
Thus, the COUNT(*) OVER ()
will run over all the rows that match the condition specified in the WHERE
clause and be included in the superset which when further gets LIMIT
ed you'll already have a dedicated column holding that total count for you. 😄
Conclusion
This isn't a write once use everywhere strategy as there will be cases where there are hardly any inserts between multiple paginated selects like this won't be efficient for when I keep refreshing my bank's app constantly when the month begins to see that salary credit record. If my banking institution uses this strategy, its a bad one (for now at least). 😛
For cases where there are or can be frequent inserts between multiple paginated selects like the bank refreshing my spends history, this is how you can optimize your database process and impress that front-end dev. 😉
References
- MySQL - Information Functions
- Postgres - SQL_CALC_FOUND_ROWS equivalent
- Postgres - Window Functions
- MariaDB - sql_calc_found_rows
- MariaDB - found_rows()
Further Reads
- MySQL to deprecate SQL_CALC_FOUND_ROWS and FOUND_ROWS
- To SQL_CALC_FOUND_ROWS or not to SQL_CALC_FOUND_ROWS?
Top comments (0)