DEV Community

Aadil Bashir
Aadil Bashir

Posted on

Query Processing in PostgreSQL

Hey guys, welcome to the third part of my blog series. In this article,I will discuss Query Processing, the most complicated subsystem in PostgreSQL.

Overview

In PostgreSQL, a single backend process is responsible for handling all client queries, although starting from version 9.6, multiple background workers are utilized for parallel queries. The database system is designed to comply with the 2011 SQL standard and offers extensive features. Among its intricate components, the query processing subsystem is particularly notable, divided into five distinct subsystems.

  1. Parser
  2. Analyser/Analyzer
  3. Rewriter
  4. Planner
  5. Executor

Parser

The initial subsystem within PostgreSQL's backend process is known as the parser. Its role is to transform SQL statements expressed in plain text into a parse tree structure. This parse tree can subsequently be utilized by the other subsystems within the backend for further processing. To illustrate, let's consider an example of this process with the given below query.

testdb=# SELECT id, data FROM tbl_a WHERE id < 300 ORDER BY data;

The parse tree for this query will be:
Image description
Parser does not check the semantics of the input.In the next step, it will be checked by the Analyzer.

Analyzer

The analyzer/analyser component in PostgreSQL is tasked with conducting a semantic analysis of the parse tree generated by the parser. Its main objective is to process the parse tree and produce a query tree as output. At the root of this query tree is a Query structure, defined in parsenodes.h, which stores essential metadata about the query, including the type of command (such as SELECT, INSERT, or other). Each leaf node of the query tree represents a clause within the query and holds relevant data associated with that specific clause. This arrangement allows for efficient organization and retrieval of information during query processing.

Image description

The query tree generated by the PostgreSQL analyzer/analyser encompasses various elements that provide crucial metadata about the query. For instance, the targetlist contains the columns that will appear in the resulting query output. In cases where the input query tree includes an asterisk (*) denoting all columns, the analyzer/analyser replaces it with the complete list of columns.

The range table within the query tree stores information about the relations utilized in the query, such as table OID and name. The join tree component preserves details about the FROM and WHERE clauses, while the sort clause comprises a list of SortGroupClause.

In summary, the query tree serves as a repository of metadata pertaining to the executed query, encompassing its type and specific clauses. This information is subsequently utilized by the following subsystems in the backend process to plan and execute the query effectively.

Rewriter

The rewriter stage in PostgreSQL is responsible for implementing the rule system and making modifications to the query tree based on the rules stored in the pg_rules system catalog. This process occurs when it is necessary for query optimization or when data integrity constraints need to be enforced.

To illustrate the role of the rewriter, consider a simple example from a book.

Image description

Planner and Executor

The planner receives a query tree from the rewriter and generates a plan tree that can be processed by the executor most effectively.The plan tree consists of various plan nodes, each containing essential information required for the executor's processing.
The plan tree guides the executor in accessing and manipulating tables and indexes within the database cluster. It interacts with the buffer manager to efficiently read and write data, utilizing designated memory areas. If required, the executor may also create temporary files to support query execution.
Furthermore, the executor plays a vital role in maintaining consistency and isolation for concurrent transactions by utilizing a concurrency control mechanism. This mechanism ensures that multiple transactions running simultaneously do not interfere with each other, preserving data integrity and preventing conflicts.
Overall, the planner generates a plan tree tailored to the specific query, and the executor utilizes this plan to interact with the database, manage resources, and maintain transactional integrity.
A simple plan tree and the relationship between the plan tree and the result is shown below.
Image description

I hope this summarized version of Query Processing is helpful to everyone, if you are curious and want to read more about it. Do checkout the link given below.
Chapter-3 The Internals of PostgreSQL

Top comments (0)