Query Processing in PostgreSQL

#apacheage #postgres #opensource

In PostgreSQL, the backend process handling all queries consists of 5 subsystems:

1. Parser
The parser generates a parse tree from an SQL statement in plain text.

2. Analyzer/Analyser
The analyzer/analyser carries out a semantic analysis of a parse tree and generates a query tree.

3. Rewriter
The rewriter transforms a query tree using the rules stored in the rule system if such rules exist.

4. Planner
The planner generates the plan tree that can most effectively be executed from the query tree.

5. Executor
The executor executes the query via accessing the tables and indexes in the order that was created by the plan tree.

Cost Estimation for Query Optimization

PostgreSQL uses cost-based query optimization which provides dimensionless values to compare the relative performance of operations, rather than absolute performance indicators. The cost functions in costsize.c estimate the costs of all operations executed by the executor, such as sequential scans and index scans.
There are three types of cost: start-up, run and total.

Creation of plan tree for a Single Table Query

The planner in PostgreSQL creates a plan tree to optimize the execution of a query. This process consists of three steps: preprocessing, getting the cheapest access path, and creating a plan tree from the cheapest path.

In the first step, preprocessing, the planner simplifies target lists, limit clauses, and normalizes Boolean expressions. The second step involves estimating the costs of all possible access paths and selecting the cheapest one. This is done by creating a RelOptInfo structure to store the access paths and their corresponding costs, estimating the costs of all possible access paths, and adding them to the RelOptInfo structure. Finally, the cheapest access path is chosen from the pathlist of the RelOptInfo structure.

In the last step, the planner generates a plan tree from the cheapest path. The root of the plan tree is a PlannedStmt structure that contains fields such as commandType, rtable, relationOids, and plantree. The plan tree is composed of various plan nodes, with the PlanNode structure as the base node. Each node corresponds to a specific operation, such as sequential scan, sort, and index scan. A PlanNode contains fields such as start-up cost, total_cost, rows, targetlist, qual, lefttree, and righttree.

The Executor

The executor processes single-table queries by taking plan nodes from the end of the plan tree to the root and executing corresponding functions. Each plan node has functions for executing its respective operation, located in the src/backend/executor/ directory. The EXPLAIN command can help understand how the executor performs because it shows the plan tree almost as it is.

Join Operations

Postgres supports 3 main join operations (with multiple variations of each): nested loop join, merge join and hash join.

1. Nested Loop Join:
The nested loop join is the most basic join algorithm used by PostgreSQL. It works by looping through one table and matching each row with the corresponding rows in the other table. This join is suitable for small tables or when the join condition is highly selective.

2. Merge Join:
The merge join algorithm is used when both tables being joined are sorted on the join columns. It works by merging the two sorted tables together, row by row. This join is typically faster than the nested loop join when dealing with large tables.

3. Hash Join:
The hash join algorithm works by creating a hash table for one of the tables being joined. It then scans the other table and checks if the corresponding rows exist in the hash table. This join is especially useful when the join condition is not selective and the tables being joined are large.

Each join algorithm has its own strengths and weaknesses, and the optimal choice depends on the size of the tables being joined, the selectivity of the join condition, and the available system resources. PostgreSQL's query planner will automatically choose the best join algorithm based on these factors.

References:

[1]: Suzuki, H. (n.d.). Internals of PostgreSQL. Retrieved from https://www.interdb.jp/pg/index.html

DEV Community

Query Processing in PostgreSQL

Cost Estimation for Query Optimization

Creation of plan tree for a Single Table Query

The Executor

Join Operations

References:

Top comments (0)

Read next

Who's hiring (December 2024)

Understanding and Solving False Sharing in Multi-threaded Applications with an actual issue I had

Color Palette is served with ShadCn Theme Editor

How to Install Nextcloud on Docker