We've had this query at work that has been irritatingly slow for a while, about 5-10 seconds during the course of a pretty common HTTP request. So far, nobody's been able to speed it up much. We ran EXPLAIN ANALYZE
on the query and it seemed okay.
If you've never looked at an EXPLAIN ANALYZE
result, they look something like this:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2.93..360.99 rows=50 width=1205) (actual time=42.779..42.779 rows=0 loops=1)
-> Hash Left Join (cost=2.93..1879.13 rows=262 width=1205) (actual time=42.778..42.778 rows=0 loops=1)
Hash Cond: (table1.id = table2.table1_id)
Filter: (((table2.table3_id = 712581) AND table1.flag) OR ((table1.table4_id = 712581) AND table1.other_flag))
Rows Removed by Filter: 24211
-> Seq Scan on table1 (cost=0.00..1784.11 rows=24211 width=1205) (actual time=0.023..17.043 rows=24211 loops=1)
-> Hash (cost=1.86..1.86 rows=86 width=8) (actual time=0.066..0.066 rows=86 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Seq Scan on table2 (cost=0.00..1.86 rows=86 width=8) (actual time=0.012..0.029 rows=86 loops=1)
Planning Time: 72.257 ms
Execution Time: 43.034 ms
The important parts of this are things like Hash Left Join
, Seq Scan
, and Hash
. Another common one you'll see is Index Scan
. It roughly translates to this sort of time complexity:
Plan | Complexity | |
---|---|---|
Hash |
O(1) | Backed by an in-memory hash table |
Index Scan |
O(log n) | Backed by an on-disk binary tree |
Seq Scan |
O(n) | Just iterates over rows in the table like you would over elements of any list |
Nested Loop |
O(n * m) | Iterating over the rows of one set for each row of another |
The query plan above is an actual query plan from our DB (tables and columns are scrubbed). It's using a Hash Left Join
and a Hash
, so it should be pretty fast! It's using a Seq Scan
in there, too, but it's happening to a set that's already been filtered — this is actually pretty common.
So if this query was on such a fast plan, why was it so slow? Today I decided to run the EXPLAIN
against our production database and realized why:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.29..907487.63 rows=50 width=1936) (actual time=54.223..8893.723 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.29..1161584.09 rows=64 width=1936) (actual time=54.221..8893.721 rows=1 loops=1)
Filter: (((table2.table3_id = 712581) AND table1.flag) OR ((table1.table4_id = 712581) AND table1.other_flag))
Rows Removed by Filter: 2095442
-> Seq Scan on table1 (cost=0.00..477337.61 rows=2098223 width=1936) (actual time=0.021..4430.654 rows=2095443 loops=1)
Filter: (flag OR ((table4_id = 712581) AND other_flag))
Rows Removed by Filter: 392143
-> Index Scan using index_1 on table3 (cost=0.29..0.31 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=2095443)
Index Cond: (table1.id = table1_id)
Planning time: 4.166 ms
Execution time: 8893.867 ms
It's using very different data structures. Not a single Hash
in sight. Our LEFT JOIN
, instead of backed by a Hash
, is now running as a Nested Loop
. That's rough.
It's most likely not joining using the Hash
method anymore because an in-memory hash for tables with millions of rows in them is probably infeasible to run for a single query. Not because it can't do it for a single query, but rather that it can't do it for all queries concurrently.
With that in mind, it makes sense why it's using a slower query plan. It just didn't occur to us at the time that it would use a different one.
This is why, whenever you want to understand the performance of a query, you need to do it on your production database. Your development or staging data set will not give you the insight you need.
This is not always tenable since some companies may have very strict policies on who can access the production database, but it's important to have tooling around this to allow profiling of queries against real-world data.
How we fixed it
You're probably curious about our solution. What we landed on (after a few iterations) involved removing the LEFT JOIN
altogether because we couldn't get it to run as anything but Nested Loop
, which was about 99% of the query runtime. Instead, because of how we were using OR
, we opted to run a UNION
query that looks something like this:
SELECT *
FROM table1
WHERE table1.table3_id = 123456
AND flag
UNION
SELECT table1.*
FROM table1
INNER JOIN table2 ON table1.id = table2.table1_id
WHERE table2.table3_id = 123456
AND other_flag
The actual query is slightly more involved because it's generated by an ORM but it's pretty equivalent to this. The query plan is almost identical, even.
The first part of the query is why we were doing the LEFT JOIN
to begin with. Since related rows in the other table were only guaranteed if other_flag
was set, we couldn't use INNER JOIN
for the whole thing.
This brought our query time, which was anywhere from 7-10 seconds depending on database load, down to a pretty consistent 125µs, a reduction in query latency of 99.9982-99.99875%. I actually had to check to make sure we were even hitting the right database because that number didn't look anything like what I expected. But sure enough, this does exactly what we needed it to do and it lets Postgres use 4 indexes (previously it was only using 1) and probably makes really nice use of the query cache.
Not bad for a day's work!
Top comments (2)
Nice write up! Can you add the original query so I can compare it to the updated one?
Ah, right, I completely forgot to add it to the post. It was something like this: