DEV Community

Frits Hoogland for YugabyteDB Distributed PostgreSQL Database

Posted on • Edited on

Postgres query execution: simple query - prepared statement

This is a series of blogposts intended to look into how postgres query execution works, so you can get a better understanding, and understand what is happening and what you see. This post is looking at simple query protocol prepare and execute usage.

In the previous post I described the execution of queries using the simple protocol, for which the important bit here is that is sends the execution request/command in a single logical request.

hold statement execution → prepared statements?

A common way with SQL databases to improve performance is to hold a statement's execution context and re-execute it with different variables. The idea is to improve performance by skipping all work prior to execution (which is: syntactic parsing, semantic parsing, rewriting and planning).

This is possible in postgres using prepared statements. A prepared statement is defined with the prepare command, and then executed using the execute command. Very simple example code:

create table t (i int);
insert into t values (1);
insert into t values (2);
--
prepare t (int) as select * from t where i = $1;
execute t(1);
Enter fullscreen mode Exit fullscreen mode

Prepared statements are not persisted in the catalog, so they have to be created after startup of each individual backend.

The postgres extended query protocol allows specific parts of query execution to be performed, such as parsing, binding (assigning/binding values to the variables) and execution, but the simple query protocol doesn't, and performs all the steps in one go. Postgres psql and Yugabyte ysqlsh do use the simple query protocol.

As a reminder, simple query execution internally looks like this:

* exec_simple_query
* ㄴ pg_parse_query
     ㄴ raw_parser                      Parse (syntactic)
* ㄴ pg_analyze_and_rewrite
     ㄴ parse_analyze                   Parse (semantic)
*    ㄴ pg_rewrite_query               
        ㄴ QueryRewrite                 Rewriter
* ㄴ pg_plan_queries
*    ㄴ pg_plan_query
        ㄴ planner                      Planner
* ㄴ PortalRun
*    ㄴ PortalRunSelect
        ㄴ ExecutorRun                  Executor
Enter fullscreen mode Exit fullscreen mode

One function is executed (exec_simple_query), which performs all the mandatory steps execution in postgres must take.

A query executed via a prepared statement still has to perform the elementary steps in order to execute a statement, there is no way around that. However, these steps are divided between the prepare and execute step, which means that some of the work is done once.

Using psql (or ysqlsh for YugabyteDB), the only protocol that it can use is the simple query protocol. This means execution using psql will execute the 'exec_simple_query' function on the backend. However, 'exec_simple_query' will execute all the functions performing the mandatory steps, there is no way around that, which I will show and explain further in the article. The genuine question at this point is: if all the steps are still executed, does prepare/execute provide any benefit?

I can answer it right away: it does. I do hope you're interested in how and why this still would make sense, and how this works, which is what I am going to explain below.

what is internally executed for prepare and execute

When the commands for prepared statements (prepare, execute) are executed, the backend will process them using the simple query protocol if the client is using that protocol (such as psql, ysqlsh, optionally JDBC if configured to do so, etc.), which means the exec_simple_query function is executed, which will perform the internal functions shown in the overview.

For the prepare and execute commands, this is what happens in the functions leading up to the execution phase of query execution:

  • The parse (syntactic) step has to be performed for postgres to understand what command to execute. This step recognises the prepare and execute commands, and tags it as such.
  • The parse (semantic) step uses the syntactic parse information and determines it's a command type that doesn't need further parsing, and marks it as a utility command. source
  • The rewriter step is cut short in pg_rewrite_query because it's considered a utility command. source
  • The query planner step is cut short in pg_plan_query because it's considered a utility command. source

At this point the only step left is the execution step. However, a prepared statement still contains a SQL statement, the statement only is executed in a different way. And it being a SQL statement means it must perform the mandatory steps of parsing, rewriting and planning before it has an execution plan that can be executed, which hasn't happened so far: all that happened so far is that postgres has recognized the parse or execute command.

execute: prepare

And this is where things are different for prepared statements, for which to me the implementation seems to be a compromise between having the ability to use prepared statements, and having a minimal impact on the existing simple query execution architecture.

Needless to say, because the execution step is the only step left (for both prepare and execute), it will perform that step. It does follow the normal steps for execution such as the Portalrun function.

However, because the statement is marked as utility, it executes the function standard_ProcessUtility. The standard_ProcessUtility function selects the correct functionality to be executed based on statement that was executed against the backend.

This is how that looks like for the prepare statement:

...steps omitted for brevity...
* ㄴ PortalRun
*    ㄴ PortalRunMulti
*       ㄴ PortalRunUtility
           ㄴ pgss_ProcessUtility
*             ㄴ standard_ProcessUtility
                 ㄴ PrepareQuery
                    ㄴ CreateCachedPlan          Initialize memory
                    ㄴ parse_analyze_varparams   Parse (semantic)
                    ㄴ QueryRewriter             Rewriter
                    ㄴ CompleteCachedPlan        Finish the plan
                    ㄴ StorePreparedStatement    Store in hashtab
Enter fullscreen mode Exit fullscreen mode

In the function PrepareQuery the steps for generating a prepared statement are executed. If you look closely, you see outside of creating the prepared statement (StorePreparedStatement), two steps of execution are executed: parsing via parse_analyze_varparams and rewriting via QueryRewriter.

execute: execute

If the prepared statement is executed via the execute statement, the execution internally performs the same steps for simple query execution, and then in the execution phase actually executes the prepared statement in the following way:

...steps omitted for brevity...
* ㄴ PortalRun
*    ㄴ FillPortalStore
*       ㄴ PortalRunUtility
           ㄴ pgss_ProcessUtility
*             ㄴ standard_ProcessUtility
                 ㄴ ExecuteQuery
                    ㄴ FetchPreparedStatement    Lookup the prepared stmt
                    ㄴ GetCachedPlan             Plan or replan query
                    ㄴ PortalRun   
                       ㄴ PortalRunSelect
                          ㄴ ExecutorRun         Executor
Enter fullscreen mode Exit fullscreen mode

The function ExecuteQuery is performing the remaining steps after preparation. If you look closely, you see it is performing the two remaining steps here:

  • GetCachedPlan (planner)
  • ExecutorRun (executor)

the inner working of the plan choice

The GetCachedPlan function is the step that provides a plan tree (execution plan) for the execution step. GetCachedPlan implements the following logic:

  • It executes choose_custom_plan which returns true or false, which is stored in the variable 'customplan'
  • If 'customplan' is false:
    • Check if a generic plan exists with CheckCachedPlan:
    • If yes: use the existing generic plan.
    • If no:
      • Execute BuildCachedPlan to create and use a generic plan.
      • Execute choose_custom_plan again, which stores the result in 'customplan' again.
  • If 'customplan' is true:
    • The function BuildCachedPlan is executed to create and use a custom plan.

An important part of the decision to use a generic plan or to create and use a custom plan is in the function choose_custom_plan. To understand what is going on there, this is a selective list of the choices, in the correct order:

  • If no parameters exist, a generic plan is okay source.
  • The choice for a generic or custom plan can be manually set with the parameter plan_cache_mode (auto/force_generic_plan/force_custom_plan) source.
  • For a generic plan to be considered 5 custom plans need to exist source.
  • Only if the cost of a generic plan is less than the average of all the custom plans choose a generic plan source.

Let me explain some the terminology here, because you must understand what a generic plan is:

  • A generic plan is an execution plan for which the parameters/predicates are considered to have average selectivity.
  • A custom plan is an execution plan for which the runtime parameters/predicates are taken and used to create an execution plan.

Here are some other things to consider:

  • For a custom plan, the backend always computes an execution plan, it never re-uses an existing plan.
  • A generic plan is computed once (per backend), and then optionally used.
  • Once a generic plan actually is used, the backend cannot switch away from it, unless it's forced by the plan_cache_mode parameter. The reason it cannot switch away from it, is because no new plans and thus plan costs are computed, which is what drives the generic or custom plan decision.
  • A generic plan is said to have a bit lower cost than an equivalent custom plan, which is why and how a generic plan is chosen when its useful. source
  • This means that in a normal case, execution of a prepared statement must calculate and create 5 custom plans before it considers a generic plan (and stops the work of creating plans).
  • The number of 5 custom plans to be created is the minimal amount of custom plans, this can be more.

The next and final step after the execution plan is generated is execution of the plan.

pg_stat_statements

pg_stat_statements works with prepared statements in a peculiar way. This probably has to do with the way it works using the hooks.

  • The prepare and deallocate commands are not registered by pg_stat_statements.
  • The (prepared statement) execute is registered with the query text of the 'prepare' command.
  • The (prepared statement) execute does include the execution time (ExecutorRun) only.

This means that because pg_stat_statements is recording execution time only, the time registered in pg_stat_statements between statements executed normally and as prepared statements will be identical, because the goal of prepared statements is to cut work and time before execution, it doesn't change anything during actual execution.

log_min_duration_statement

log_min_duration_statement, when configured, and if the the set min limit is exceeded, shows the full time spent in exec_simple_query, for prepare, execute and deallocate (the removal of a prepared statement).

conclusion

This article tried taking you through the use and internals of prepared statements and the simple query protocol.

The simple query protocol has a fixed way of working, and a query in postgres must follow the steps of parse, rewrite, planning and execute.

The use of prepared statements with the simple query protocol is solved by effectively moving all the work for the query to be prepared and executed into the execution phase.

The prepare step performs the parsing, the execution step performs the planning and execution.

This is why I put a question mark with 'hold cursors → prepared statements': the implementation means that with postgres prepared statements, there isn't really a situation where the cursor/prepared statement is created and held in such a way that it only needs to perform the execute, in postgres the prepared statement execution step includes the planning step.

Executing the planning step means that by default, a prepared statement must compute a plan at least 5 times before it creates a generic plan and computes whether the generic plan has a lower cost than the average of the computed custom plans. If the generic plan has a lower cost, it will be chosen. Using a generic plan has the advantage that the time and CPU for query planning is skipped, and has the disadvantage it cannot switch plans anymore, because no new plans are created to change the generic plan choice. There is no guarantee a generic plan is chosen after 5 times, it's dependent on the cost of the custom and generic plans, and thus potentially a generic plan is never chosen.

Top comments (0)