Learning the Science behind Query Processing, its Cost Estimation and Join Operations in PostgresSQL

#postgres #postgressql #database

Here we will discuss query processing, cost estimation of query and joins. So lets get started.

I. Query Processing

Query processing comprises of 5 parts explained below:
1. Parser:
This is the first part of query processing. It is responsible for generating the parse tree from SQL statement. It checks the syntax of the SQL statement while building parse tree.
2. Analyzer/Analyser:
Analyzer/Analyser takes the parse tree from the parser and do semantic analysis on it. After this analysis, it generates another tree known as query tree. The root of this query tree contains the metadata about the query such as the type of the query(SELECT, INSERT etc).
3. Rewriter:
Next comes the Rewriter, it transforms the query tree according to the rule systems given in pg_rules system catalog. It basically works on the optimization of the query tree.
4. Planner:
The planner receives the query tree from the rewriter and generates the plan tree. This tree describes the plan about the execution of the query.
5. Executor:
The executor takes the plan tree designed by the planner and executes it. The plan tree is composed of nodes. Each node contains some information that executor requires during processing.

II. Cost Estimation in Single-Table Query

PostgreSQL's query optimization is based on cost. Costs are estimated by the functions defined in costsize.c. In PostgresSQL, there are three types of costs.

The startup cost is the cost expended before the first tuple is fetched.
The run cost is the cost to fetch all tuples.
The total cost is the sum of the costs of both startup and run costs.

Creating the Plan Tree of a Single-Table Query

The planner creates a plan for the execution of the query and it involves three parts:

Preprocessing: In preprocessing, the planner do some simplification of target lists, normalization of boolean expression and flattening of AND/OR expressions
Getting the cheapest Access path: In this segment, the planner estimates the costs of all possible access paths and selects the most cheapest one. For this purpose, it creates a RelOpInfo structure to store the access paths and corresponding costs. From this structure, it selects the cheapest path.
Plan Tree: In the last part, the planner generates the plan tree from the access path. The root of this tree is a PlannedStmt structure. It contains 19 fields and here are 4 representative fields: commandType that stores the type of operation, rtable that stores rangeTable enteries, relationOids that stores oids of the related tables for this query, plantree stores a plan tree that is composed of all plan nodes.

Working of Executor:
The executor process the queries by taking the plan nodes from the end. EXPLAIN command is used to understand the working of executor as it shows the plan tree almost as it is.

III. Join Operations

PostgresSQL supports three join operations that are discussed below:

Nested loop join is the most fundamental join and it works by iterating through each row of a table and matching it with corresponding rows of another table. It is effective in case when one table is small than the other.
Merge Join traverse through both tables with sorted list of join keys and match the rows with same join keys.
Hash join as described by name uses hash function to join. In this method, hash table is created from one table and used to find the matching rows from the other table.

References:

https://www.interdb.jp/pg/pgsql03.html
https://www.interdb.jp/pg/pgsql0302.html
https://www.interdb.jp/pg/pgsql0303.html

DEV Community

Learning the Science behind Query Processing, its Cost Estimation and Join Operations in PostgresSQL

I. Query Processing

II. Cost Estimation in Single-Table Query

Creating the Plan Tree of a Single-Table Query

III. Join Operations

References:

Top comments (0)

Read next

What to use parquet or CSV?

How to Deploy Directus to Render.com

Get SQL-Server Stored Procedures with C#

Data API for Amazon Aurora Serverless v2 with AWS SDK for Java - Part 6 Comparing cold and warm starts between Data API and JDBC