Franck Pachot for YugabyteDB Distributed PostgreSQL Database

Posted on Jun 9, 2022 • Edited on Nov 11, 2022

Predictable plans with pg_hint_plan full hinting

#yugabytedb #postgres #sql #hints

Any SQL database provides hints to give more control on the optimizer, either to troubleshoot and understand the query planner decision, to workaround impossible cost estimation, or to fix a plan for stability.

Yes, PostgreSQL has query planner hints

With PostgreSQL, the extension to do it, pg_hint_plan is really good, but not widely used because not included in the core, not even in contrib. The consequence is that people install it only when needing it, without the time to learn how to hint correctly, and may think that "my hint is not used" and give up.

Better learn before you need it

To learn it before having the need for it, you can use this dockerfile to build a PostgreSQL image with pg_hint_plan. Or better, use the YugabyteDB image or free managed with cloud shell. As bad execution plans are worse in high performance distributed SQL databases, YugabyteDB has pg_hint_plan installed and enabled by default.

Example on YugabyteDB

I create 3 simple tables:

create table table_a( id int primary key) ;
create table table_b( id int primary key) ;
create table table_c( id int primary key) ;

I query the execution plan for a join between those two tables:

yugabyte=# explain (costs off, timing off)
           select * from table_a a join table_b b using(id);

                   QUERY PLAN
------------------------------------------------
 Nested Loop
   ->  Seq Scan on table_a
   ->  Index Scan using table_b_pkey on table_b
         Index Cond: (id = table_a.id)

The query planner has chosen a Nested Loop with table_a being the outer table and table_b the inner table. It is executed as an outer loop on table_a rows and, for each one, an inner loop on table_b matching rows. The outer table is also called 'left', and inner 'right', because we are used to read from left to right. Explain displays it from top to bottom with a join operation and two child nodes: first is the outer table, second is the inner table.

Leading the join order

Now, let's say that, for whatever reason (understanding, troubleshooting, fixing), I want the opposite order, starting to scan table_b and from it access to table_a. I'll use the Leading hint to define by which table the planner has to start, and which table to join next. Note that when I say "table" it is actually the alias because one table may have different roles in a query block.

A common error is using the Leading( b a ) hint like this:

yugabyte=# explain (costs off, timing off)
          /*+ Leading( b a ) */ 
          select * from table_a a join table_b b using(id);

                    QUERY PLAN
--------------------------------------------------
 Nested Loop
   ->  Seq Scan on table_a a
   ->  Index Scan using table_b_pkey on table_b b
         Index Cond: (id = a.id)

As you can see, nothing has changed in my plan. The reason is that the Leading hint with a simple list of tables defines in which order the query planner will consider each table, but doesn't define the join direction, which is still left to the query planner estimations. This Leading syntax just says "start by joining b and a", which is useless here because there are no other solution with two tables only. There's no error reported in the verbose log (see below how to enable it) because the hint was used, even if the intention of the user was different.

Outer to Inner join direction

From a relational point of view, the join operation is commutative. But the implementation of the join method (Nested Loop, Merge Join, Hash Join) is not.

In any query, the tables will be joined two by two, visible as two child operations on the join node displayed by the explain tree. Each one is actually a table or index scan, or the result from a previous join. For each join, there are two possible directions, defining which one is the outer (or left, on top) or the inner (or right, on second, and last, position). This choice is important because the efficiency of a join method, like Nested Loop, depends on the number of rows of the outer table, and the access path to the inner one.

Different syntax for join direction

In order to define the join direction, you need more parentheses to order each join pairs, where the left one is the outer and the right one the inner.

With those two tables, this is as simple as defining (b a) instead of b a and this is put within the Leading() arguments:

yugabyte=# explain (costs off, timing off)
          /*+ Leading( ( b a ) ) */ 
          select * from table_a a join table_b b using(id);

                    QUERY PLAN
--------------------------------------------------
 Nested Loop
   ->  Seq Scan on table_b b
   ->  Index Scan using table_a_pkey on table_a a
         Index Cond: (id = b.id)

This is the expected plan, joining table_a and table_b but with the table_b being the outer (left, first child) and table_a the inner one.

Leading hint for 3 tables

This is more visible with three tables where this (a b) pair is the outer table within the ( ( b a ) c ) pair:

yugabyte=# explain (costs off, timing off)
          /*+ Leading( ( ( b a ) c ) ) */
          select * from table_a a join table_b b using(id)
                                  join table_c c using(id);

                       QUERY PLAN
--------------------------------------------------------
 Nested Loop
   ->  Nested Loop
         ->  Seq Scan on table_b b
         ->  Index Scan using table_a_pkey on table_a a
               Index Cond: (id = b.id)
   ->  Index Scan using table_c_pkey on table_c c
         Index Cond: (id = a.id)

Here I introduced a new pair of join, with the left one being the result of the join between table_a and table_b, hinted with the same direction as above, with (a b), and the right-side table is table_c with ( (a b) c). This adds table_c as the inner table, accessed for each row from the outer table, which is the result of the previous join between table_a and table_b.

Swapping join inputs

Having table_c as the inner table is one possibility, but we may want it to be the outer table by putting it on the left side with ( c (b a) ) instead of ( ( b a ) c ):

yugabyte=# explain (costs off, timing off)
          /*+ Leading( ( c (b a) ) ) */ 
          select * from table_a a join table_b b using(id)
                                  join table_c c using(id);

                       QUERY PLAN
--------------------------------------------------------
 Hash Join
   Hash Cond: (c.id = a.id)
   ->  Seq Scan on table_c c
   ->  Hash
         ->  Nested Loop
               ->  Seq Scan on table_b b
               ->  Index Scan using table_a_pkey on table_a a
                     Index Cond: (id = b.id)

When table_c was the inner table, the query planner had chosen a Nested Loop because there's a fast access in the inner loop, with the primary key index on table_c. As the inner loop of a Nested loop is executed many times, it is efficient only with a permanent range or point access structure, sorted or hashed, like an index. However, when the result of another join is the inner table, there's no permanent structure to access to specific rows in it. But the SQL executor can create a temporary one for the duration of the query. This is why the query planner decides to do a Hash Join, creating the Hash structure on the result of the previous join, to get fast access to this inner row source.

Join method: Nested Loop, Hash or Merge join

When you force a join order you probably also want to force the join method. Remember that, when you need to hint a query, the reason is probably that you don't trust the query planner estimations. If they were bad to get the right join order, then they are probably bad for the join method as well. Even if you see the expected join method above, nothing guarantees that it will always be this one on new data. But there are hints for the join method as well.

The plan above could be fixed by adding a HashJoin() hint. The following, with NestLoop() will force a Nested Loop from table_c to the result of the join between table_a and table_b:

yugabyte=# explain (costs off, timing off)
          /*+ Leading( ( c (b a) ) ) NestLoop(a b c) */ 
          select * from table_a a join table_b b using(id)
                                  join table_c c using(id);

                          QUERY PLAN
--------------------------------------------------------------
 Nested Loop
   Join Filter: (a.id = c.id)
   ->  Seq Scan on table_c c
   ->  Materialize
         ->  Nested Loop
               ->  Seq Scan on table_b b
               ->  Index Scan using table_a_pkey on table_a a
                     Index Cond: (id = b.id)

I have written NestLoop(a b c) in alphabetical order to show that it doesn't matter. This just says that the last join on those three tables must be a Nested Loop. By convention, I prefer to write it as NestLoop(b a c) but it doesn't matter. Note that here, nothing guarantees the Nested Loop for the first join beween table_a and table_b. I should add NestLoop(b a) for this.

An Hint for Hinting: count the parentheses

Note that if I had defined Leading(b a c) as I did at the begining of this post, it would have left many possibilities for the query planner like ((b a)c), ((a b) c), (c(a b)) or ((a b)c). And, in addition to that, two join methods for each one.

For full hinting between n tables, you need to have n-1 nested pairs in Leading(). And each pair is a (). That's a lot of parentheses. By the way, do you know that the first implementation of POSTGRES had some code in LISP?

Back to our hints, once the Leading() is defined with all those n-1 nested pairs, there is one join method to define for each pair (NestLoop(), HashJoin() or MergeJoin). The one at position i (from 1 to n-1) will have i+1 arguments.

In addition to that, to fully define the execution plan, the access for each of n tables must be one of the scan method, like SeqScan(), IndexScan(), IndexOnlyScan() or BitmapScan(). In YugabyteDB, which uses a storage based on PostgreSQL tuples rather than pages, you don't see BitmapScan.

For n tables, you will have in total 2*n hints. And you can also count the opening parentheses, which equals the closing ones, and should find 6*n-2 of them for full hinting.

The last plan above is fully hinted, which means not leaving any other join and access decision to the query planner, by:

/*+ 
    Leading( ( c (b a) ) )
    NestLoop(b a) NestLoop(b a c) 
    SeqScan(b) IndexScan(a table_a_pkey) SeqScan(c)
*/

Verification and Troubleshooting

Some hint error syntax will raise a warning. To go further, as mentioned in the previous post, you can get more info with:

set pg_hint_plan.debug_print=verbose;
set client_min_messages = log;

Here is my fully hinted query:

yugabyte=# explain (costs off, timing off)
/*+ 
    Leading( ( c (b a) ) )
    NestLoop(b a) NestLoop(b a c) 
    SeqScan(b) IndexScan(a table_a_pkey) SeqScan(c)
*/ 
select * from table_a a join table_b b using(id)
                        join table_c c using(id);

LOG:  pg_hint_plan[qno=0xf]: planner
LOG:  pg_hint_plan[qno=0xf]: setup_hint_enforcement index deletion: relation=16542(table_a), inhparent=0, current_hint_state=0x3a1c20c8, hint_inhibit_level=0, scanmask=0x2
LOG:  pg_hint_plan[qno=0xf]: setup_hint_enforcement index deletion: relation=16547(table_b), inhparent=0, current_hint_state=0x3a1c20c8, hint_inhibit_level=0, scanmask=0x1
LOG:  pg_hint_plan[qno=0xf]: setup_hint_enforcement index deletion: relation=16552(table_c), inhparent=0, current_hint_state=0x3a1c20c8, hint_inhibit_level=0, scanmask=0x1
LOG:  pg_hint_plan[qno=0xf]: HintStateDump: {used hints:IndexScan(a table_a_pkey)SeqScan(b)SeqScan(c)NestLoop(a b)NestLoop(a b c)Leading((c (b a)))}, {not used hints:(none)}, {duplicate hints:(none)}, {error hints:(none)}

                          QUERY PLAN
--------------------------------------------------------------
 Nested Loop
   Join Filter: (a.id = c.id)
   ->  Seq Scan on table_c c
   ->  Materialize
         ->  Nested Loop
               ->  Seq Scan on table_b b
               ->  Index Scan using table_a_pkey on table_a a
                     Index Cond: (id = b.id)
(8 rows)

The important piece is the HintStateDump which will warn you if some hints were not used, are duplicated, or with error. The used hints is a rewrite of your hints that were used during query planning. You can see here that my NestLoop(b a) and NestLoop(b a c) were rewritten to NestLoop(a b) and NestLoop(a b c) which proves that the order doesn't matter here. There are stored internally in alphabetical order.

The setup_hint_enforcement index deletion is just a trace that indexes are removed from the list of possible ones: totally for SeqScan() (scanmask=0x1) or partially for IndexScan() (scanmask=0x2). When partially, only the one named (like table_a_pkey here with IndexScan()) is kept, the ones matching a pattern with IndexScanRegexp().

Note that if you are only interested by the HintStateDump the debug level on is sufficient. Here is an exhibit for different reporting:

yugabyte=# set pg_hint_plan.debug_print='on';
SET

yugabyte=# select /*+ 
   Leading(x) IndexScan(a) IndexScan(b) SeqScan(b)
   Set(random_page_cost 1e42) xxx() Set(seq_page_cost 0.42) 
*/ * from table_a a;

INFO:  pg_hint_plan: hint syntax error at or near "Leading(x) IndexScan(a) IndexScan(b) SeqScan(b)
   Set(random_page_cost 1e42) xxx() Set(seq_page_cost 0.42)
"
DETAIL:  Leading hint requires at least two relations.
INFO:  pg_hint_plan: hint syntax error at or near "xxx() Set(seq_page_cost 0.42)
"
DETAIL:  Unrecognized hint keyword "xxx".
INFO:  pg_hint_plan: hint syntax error at or near "IndexScan(b) SeqScan(b)
   Set(random_page_cost 1e42) xxx() Set(seq_page_cost 0.42)
"
DETAIL:  Conflict scan method hint.
LOG:  pg_hint_plan:
used hint:
IndexScan(a)
Set(random_page_cost 1e42)
not used hint:
SeqScan(b)
duplication hint:
IndexScan(b)
error hint:
Leading(x)

 id
----
(0 rows)

Mentioning two conflicting scan methods for the same alias has used the first one IndexScan(a) and discarded the other SeqScan(a) as duplication hint. A syntax error in a valid hint, like Leading with only one alias, is detected as error hint. An invalid hint, like xxx here has stopped the parsing and the valid Set(seq_page_cost 0.42) after it is ignored.

Other hints

There are other hints to choose the index with a smarter way. Don't forget that index names can change, which is transparent for SQL, but using hints break the logical-physical independence as you have to name the indexes. There is also the Rows hint that is smarter in some cases, fixing the estimations rather than forcing anything, but it doesn't fix a plan. This is used to add knowledge about data correlation between tables, which is a common source of cost misestimate.

There are also the opposite hints like NoSeqScan to allow any scan except SeqScan but remember that hints are already a negation. They are actually implemented by raising the cost of the operation to its maximum value.

You can also set parameters at statement level rather than session level with the Set hint. Useful to restrict the scope of a workaround to the scope of the problem, but I don't think it is needed to fix a plan. For example, my fully hinted query above will still show the same plan in a session where I set enable_indexscan=off; because the IndexScan hint internally sets it to on when specified for a table.

Core message

The most important decision is the join order, in the Leading() hint and to define the direction its arguments should be a list of nested ordered pairs rather than an ordered list. Do not think that, when you see the right execution plan after adding some hints, it means that you have fixed a plan. Without full hinting (the 2*n hints with 6*n-2 parenthesis pairs in total for n tables involved), the query planner may still come with another idea, and that may happen next Sunday at 2 a.m.

Here is a PostgresWorld Webinar on this:

youtube.com

DEV Community