Franck Pachot for AWS Heroes

Posted on May 17, 2023

pg_hint_plan in views? No. But in UDFs, Yes

#yugabytedb #distributed #sql #postgres

I was talking about pg_hint_plan last week in Belgium pgconf.be and got a question about hints in views. Hints are ignored in views but accepted in functions. PostgreSQL is so powerful that there's always a feature to help.

I'm running this demo on YugabyteDB where pg_hint_plan is installed by default. This demo runs the same on PostgreSQL with the extension installed (example), or on PostgreSQL-compatible databases, like Amazon Aurora on AWS, when adding pg_hint_plan to shared_preload_libraries. To be clear, as I've been asked when presenting about pg_hint_plan, CockroachDB is not PostgreSQL-compatible beyond the wire protocol, some syntax and a limited subset of features, and you cannot install PostgreSQL extensions, so this doesn't apply to CRDB at all.

I create two tables to join together:

drop table if exists demo1, demo2 cascade;
create table demo1 as select generate_series(1,10000) n;
create table demo2 as select generate_series(1,10000) n;
create index demo1i on demo1(n);
create index demo2i on demo1(n);

The join between those ten thousand rows is fast, 50 milliseconds with a Merge Join:

yugabyte=# explain (costs off, analyze) 
           select count(*) from demo1 natural join demo2;

                                     QUERY PLAN
------------------------------------------------------------------------------------
 Aggregate (actual time=49.787..49.787 rows=1 loops=1)
   ->  Merge Join (actual time=44.105..49.280 rows=10000 loops=1)
         Merge Cond: (demo1.n = demo2.n)
         ->  Sort (actual time=22.593..23.758 rows=10000 loops=1)
               Sort Key: demo1.n
               Sort Method: quicksort  Memory: 853kB
               ->  Seq Scan on demo1 (actual time=2.150..20.603 rows=10000 loops=1)
         ->  Sort (actual time=21.505..22.580 rows=10000 loops=1)
               Sort Key: demo2.n
               Sort Method: quicksort  Memory: 853kB
               ->  Seq Scan on demo2 (actual time=2.545..18.501 rows=10000 loops=1)
 Planning Time: 0.123 ms
 Execution Time: 49.885 ms
 Peak Memory Usage: 1851 kB
(14 rows)

To show the hint usage, I'll force a Nested Loop (and disable YugabyteDB Nested Loop Batching optimization so that I have ten thousand loops). Yes, for educational purposes I add a hint to get bad performance on a query that is optimal without it ;) I will also increase the pg_hint_plan message level to see the hint usage logged in psql. Then, I'll be sure that the hint is used when the response time is longer and the hints used is not empty.

Here is my session setting:

set yb_bnl_batch_size=1;

set pg_hint_plan.message_level=notice;
set pg_hint_plan.debug_print=detailed;

Running without hints is the same as above, with no valid hint reported:

yugabyte=# explain (costs off, analyze) 
           select count(*) from demo1 natural join demo2;

NOTICE:  hints in comment="(none)", query="explain (costs off, analyze)
           select count(*) from demo1 natural join demo2;", debug_query_string="explain (costs off, analyze)
           select count(*) from demo1 natural join demo2;"
NOTICE:  pg_hint_plan[qno=0x1e]: planner: no valid hint
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Aggregate (actual time=51.162..51.162 rows=1 loops=1)
   ->  Merge Join (actual time=44.479..50.627 rows=10000 loops=1)
         Merge Cond: (demo1.n = demo2.n)
         ->  Sort (actual time=22.290..23.582 rows=10000 loops=1)
               Sort Key: demo1.n
               Sort Method: quicksort  Memory: 853kB
               ->  Seq Scan on demo1 (actual time=2.050..20.260 rows=10000 loops=1)
         ->  Sort (actual time=22.168..24.029 rows=10000 loops=1)
               Sort Key: demo2.n
               Sort Method: quicksort  Memory: 853kB
               ->  Seq Scan on demo2 (actual time=2.006..20.148 rows=10000 loops=1)
 Planning Time: 0.139 ms
 Execution Time: 51.245 ms
 Peak Memory Usage: 1851 kB
(14 rows)

I force a Nested Loop join with a hint, visible with used hints:NestLoop(demo1 demo2)}, and the response time for 10000 loops jums to 5 seconds - that's what happens with bad join methods and without batching optimizations:

yugabyte=# explain (costs off, analyze) 
           /*+ NestLoop(demo1 demo2) */
           select count(*) from demo1 natural join demo2;

NOTICE:  hints in comment="NestLoop(demo1 demo2) ", query="explain (costs off, analyze)
           /*+ NestLoop(demo1 demo2) */
           select count(*) from demo1 natural join demo2;", debug_query_string="explain (costs off, analyze)
           /*+ NestLoop(demo1 demo2) */
           select count(*) from demo1 natural join demo2;"
NOTICE:  pg_hint_plan[qno=0x20]: planner
NOTICE:  pg_hint_plan[qno=0x20]: setup_hint_enforcement no hint applied: relation=18024(demo1), inhparent=0, current_hint=0xaaae226af860, hint_inhibit_level=0, scanmask=0x1f
NOTICE:  pg_hint_plan[qno=0x20]: setup_hint_enforcement no hint applied: relation=18027(demo2), inhparent=0, current_hint=0xaaae226af860, hint_inhibit_level=0, scanmask=0x1f
NOTICE:  pg_hint_plan[qno=0x20]: HintStateDump: {used hints:NestLoop(demo1 demo2)}, {not used hints:(none)}, {duplicate hints:(none)}, {error hints:(none)}

                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Aggregate (actual time=4628.102..4628.102 rows=1 loops=1)
   ->  Nested Loop (actual time=4.204..4620.677 rows=10000 loops=1)
         ->  Seq Scan on demo2 (actual time=2.066..9.141 rows=10000 loops=1)
         ->  Index Only Scan using demo2i on demo1 (actual time=0.447..0.448 rows=1 loops=10000)
               Index Cond: (n = demo2.n)
               Heap Fetches: 0
 Planning Time: 0.230 ms
 Execution Time: 4628.166 ms
 Peak Memory Usage: 30 kB
(9 rows)

This confirms that I can check the hint usage from two points of views: the response time and the hint usage message. Of course, I can also see it in the execution plan join method, but you will understand later why I need something else here.

View

I create a view with the hint, the hints are read during parsing the DDL, as seen by hints in comment but that doesn't mean that they will be used:

yugabyte=# create view v as /*+ NestLoop(demo1 demo2) */
           select* from demo1 natural join demo2;

NOTICE:  hints in comment="NestLoop(demo1 demo2) ", query="create view v as select /*+ NestLoop(demo1 demo2) */
* from demo1 natural join demo2;", debug_query_string="create view v as select /*+ NestLoop(demo1 demo2) */
* from demo1 natural join demo2;"
NOTICE:  hints in comment="NestLoop(demo1 demo2) ", query="create view v as select /*+ NestLoop(demo1 demo2) */
* from demo1 natural join demo2;", debug_query_string="create view v as select /*+ NestLoop(demo1 demo2) */
* from demo1 natural join demo2;"
CREATE VIEW

In a CREATE VIEW, hints on join methods have no effect as the query is not executed. This just stores metadata in the dictionary.

I check this when I query the view:

yugabyte=# explain (costs off, analyze) select count(*) from v;

NOTICE:  hints in comment="(none)", query="explain (costs off, analyze) select count(*) from v;", debug_query_string="explain (costs off, analyze) select count(*) from v;"
NOTICE:  pg_hint_plan[qno=0x32]: planner: no valid hint

                                     QUERY PLAN
------------------------------------------------------------------------------------
 Aggregate (actual time=51.971..51.971 rows=1 loops=1)
   ->  Merge Join (actual time=46.565..51.454 rows=10000 loops=1)
         Merge Cond: (demo1.n = demo2.n)
         ->  Sort (actual time=21.775..22.907 rows=10000 loops=1)
               Sort Key: demo1.n
               Sort Method: quicksort  Memory: 853kB
               ->  Seq Scan on demo1 (actual time=2.218..19.676 rows=10000 loops=1)
         ->  Sort (actual time=24.783..25.825 rows=10000 loops=1)
               Sort Key: demo2.n
               Sort Method: quicksort  Memory: 853kB
               ->  Seq Scan on demo2 (actual time=2.290..22.707 rows=10000 loops=1)
 Planning Time: 0.215 ms
 Execution Time: 52.770 ms
 Peak Memory Usage: 1851 kB
(14 rows)

There's no valid hint, and the join is a Merge Join, as with the non-hinted query.

I'll keep the view but remove the hints because it can be misleading to keep something that is not used:

create or replace view v as 
           select* from demo1 natural join demo2;

I'll still use the view, but I need hints on top of it.

Table function

I create a function to call the view, adding my hint there:

create or replace function v() returns setof v as
$$
/*+ NestLoop(demo1 demo2) */ select * from v;
$$ language sql;

Besides the hint, everything is generic here: the description of the output is the same as the view v thanks to setoff and the query is a simple select *. I have named my function with the same name: v() but that's up to you.

I query it to check that the hint is used:

yugabyte=# explain (costs off, analyze) select count(*) from v();

NOTICE:  hints in comment="(none)", query="explain (costs off, analyze) select count(*) from v();", debug_query_string="explain (costs off, analyze) select count(*) from v();"
NOTICE:  pg_hint_plan[qno=0x3e]: planner: no valid hint
NOTICE:  hints in comment="NestLoop(demo1 demo2) ", query="
/*+ NestLoop(demo1 demo2) */ select * from v;
", debug_query_string="explain (costs off, analyze) select count(*) from v();"
NOTICE:  pg_hint_plan[qno=0x40]: planner
NOTICE:  pg_hint_plan[qno=0x40]: setup_hint_enforcement no hint applied: relation=18024(demo1), inhparent=0, current_hint=0xaaae2145cd80, hint_inhibit_level=0, scanmask=0x1f
NOTICE:  pg_hint_plan[qno=0x40]: setup_hint_enforcement no hint applied: relation=18027(demo2), inhparent=0, current_hint=0xaaae2145cd80, hint_inhibit_level=0, scanmask=0x1f
NOTICE:  pg_hint_plan[qno=0x40]: HintStateDump: {used hints:NestLoop(demo1 demo2)}, {not used hints:(none)}, {duplicate hints:(none)}, {error hints:(none)}

                                  QUERY PLAN
------------------------------------------------------------------------------
 Aggregate (actual time=4573.323..4573.323 rows=1 loops=1)
   ->  Function Scan on v (actual time=4572.267..4572.764 rows=10000 loops=1)
 Planning Time: 0.074 ms
 Execution Time: 4573.616 ms
 Peak Memory Usage: 1117 kB
(5 rows)

The hint has been used. I see that on the execution time, and the used hints:NestLoop(demo1 demo2)} message. However, the join method is not visible anymore in the execution plan. Is it a problem? Probably not because when you want to explain it you can explain the view. That's why I keep the view and build the function on top of it. It is easier to maintain and troubleshoot.

Auto Explain Log Nested Statements

Note that if you use auto_explain (must be added to shared_preload_libraries) you can set auto_explain.log_nested_statements=on to see the execution plan behind the function. I've set this for my session:

-- display all logs to the psql console
set client_min_messages = log;
-- log the statements text and duration
set log_statement='all';
-- log non-verbose hint usage
set pg_hint_plan.debug_print=on;
-- log execution plan with analyze and nested queries
set auto_explain.log_min_duration=1000;
set auto_explain.log_nested_statements=on;
set auto_explain.log_analyze=on;

and run the query on the table function:

yugabyte=# select count(*) from v();

LOG:  statement: select count(*) from v();
LOG:  pg_hint_plan:
used hint:
NestLoop(demo1 demo2)
not used hint:
duplication hint:
error hint:

LOG:  duration: 4162.019 ms  plan:
Query Text:
/*+ NestLoop(demo1 demo2) */ select * from v;

Nested Loop  (cost=0.00..1304.00 rows=5000 width=4) (actual time=4.313..4151.781 rows=10000 loops=1)
  ->  Seq Scan on demo2  (cost=0.00..100.00 rows=1000 width=4) (actual time=1.895..8.386 rows=10000 loops=1)
  ->  Index Only Scan using demo2i on demo1  (cost=0.00..1.10 rows=10 width=4) (actual time=0.399..0.401 rows=1 loops=10000)
        Index Cond: (n = demo2.n)
        Heap Fetches: 0
LOG:  duration: 4186.605 ms  plan:
Query Text: select count(*) from v();
Aggregate  (cost=12.75..12.76 rows=1 width=8) (actual time=4186.593..4186.593 rows=1 loops=1)
  ->  Function Scan on v  (cost=0.25..10.25 rows=1000 width=0) (actual time=4184.992..4185.759 rows=10000 loops=1)

 count
-------
 10000
(1 row)

Here I have many ways to know that the hint is used:

pg_hint_plan.debug_print=on logs the hint usage
auto_explain shows the hint in the SQL text, and the execution plan
log_statement shows the duration
log_nested_statements shows the query behind the function
client_min_messages shows the log in my console

Prepared Statements

Note that one consequence of using a function is that the query cannot be prepared for re-use. You can prepare the call to the function, but the query inside will be parsed on each call. This is visible when tracing the LOG: pg_hint_plan which is displayed on each call because the nested query is parsed each time:

yugabyte=# set pg_hint_plan.debug_print=on;
SET
yugabyte=# set client_min_messages = log;
SET
yugabyte=# prepare v as select count(*) from v();
LOG:  statement: prepare v as select count(*) from v();
PREPARE
yugabyte=# execute v;
LOG:  statement: execute v;
DETAIL:  prepare: prepare v as select count(*) from v();
LOG:  pg_hint_plan:
used hint:
NestLoop(demo1 demo2)
not used hint:
duplication hint:
error hint:

 count
-------
 10000
(1 row)

yugabyte=# execute v;
LOG:  statement: execute v;
DETAIL:  prepare: prepare v as select count(*) from v();
LOG:  pg_hint_plan:
used hint:
NestLoop(demo1 demo2)
not used hint:
duplication hint:
error hint:

 count
-------
 10000
(1 row)

This should not matter for small simple queries, especially when hints are limiting the query planner search tree.

If you want to hint a prepared statement, then you have to prepare it with the hint, to read directly from the view:

yugabyte=# deallocate v;
LOG:  statement: deallocate v;
DEALLOCATE
yugabyte=# prepare v as /*+ NestLoop(demo1 demo2) */ select count(*) from v;
LOG:  statement: prepare v as /*+ NestLoop(demo1 demo2) */ select count(*) from v;
PREPARE
yugabyte=# execute v;
LOG:  statement: execute v;
DETAIL:  prepare: prepare v as /*+ NestLoop(demo1 demo2) */ select count(*) from v;
LOG:  pg_hint_plan:
used hint:
NestLoop(demo1 demo2)
not used hint:
duplication hint:
error hint:

 count
-------
 10000
(1 row)

yugabyte=# execute v;
LOG:  statement: execute v;
DETAIL:  prepare: prepare v as /*+ NestLoop(demo1 demo2) */ select count(*) from v;
 count
-------
 10000
(1 row)

Here the second execution was not parsed again (no LOG: pg_hint_plan log) but I had to put the hint in the PREPARE. This is not what I wanted to do by hinting inside a view.

(Micro)Service

Besides the possibility to add hints, encapsulating SQL queries in views and user defined functions (UDFs) is a great way design your database access as a service.

The function exposes the service, and can even have parameters.
The view exposes the data, decoupled from the physical data model. Both are declarative.
The function body can add some procedural logic, with hints, like the join order and join methods, to get predictable performance and avoid execution plan changing.

This may sound heretic for SQL fans but NoSQL became popular by providing an API with predictable performance. You can do the same with all the power of SQL databases. You can even join from those functions.

There's no shame in knowing the best execution plan and hinting for it, as long as it is done correctly. I'm not talking about complex reporting queries where the plan should adapt to the input parameters and the data distribution, but for OLTP with simple access patterns, where you want stable execution plans. When a new use-case comes, you define it as a view, add the right indexes if needed, validate the execution plan, and add a User Defined Function to encapsulate this, with additional query planner hints if you think the optimizer may not have always the right one.

Of course, you will not encapsulate all your application queries like this. But for a microservice that has a handful of critical queries, it can be a good design to have them as user defined table functions or prepared statement. If you really control your hints with full hinting.

If you come to PostgreSQL Conference Germany, June 27, 2023 I'll present "pg_hint_plan – get the right plan without surprises" with a new slide about views and, hopefully, new questions.

DEV Community