Understanding internals of PostgreSQL - Query Processing

Welcome back to my series on the internals of PostgreSQL. In this series, I am sharing a summary of my learnings from the book "The Internals of PostgreSQL" by Hironobu SUZUKI. In this post, I'll be sharing the key takeaways from chapter 03, the query processing.

PostgreSQL's query processing subsystem is a complex system that efficiently handles SQL. In this and upcoming two posts, we will delve into Chapter 3 which focuses on query processing, particularly query optimization. The chapter is divided into three parts: an overview of query processing in PostgreSQL, steps for obtaining the optimal plan for a single-table query, and processes for obtaining the optimal plan for a multiple-table query.

In this first post on chapter three, we will get an overview of query processing in PostgreSQL. PostgreSQL employs a backend process to handle all queries issued by the connected client, despite utilizing multiple background worker processes for parallel queries. This backend comprises five subsystems, as listed below:

Parser generates a parse tree from an SQL statement in plain text
Analyzer carries out a semantic analysis of a parse tree and generates a query tree
Rewriter transforms a query tree using the rules stored in the rule system
Planner generates the plan tree that can most effectively be executed from the query tree
Executor executes the query via accessing the tables and indexes in the order created by the plan tree.

Parser

The Parser is responsible for generating a parse tree from an SQL statement in plain text. The parse tree can be read by subsequent subsystems and is represented by the SelectStmt structure defined in parsenodes.h. While the Parser checks only the syntax of the input query, it returns an error only when there is a syntax error in the query.

Analyzer

The Analyzer performs a semantic analysis of the parse tree generated by the Parser and generates a query tree. The Query structure defined in parsenodes.h forms the root of the query tree, which contains metadata of its corresponding query, including the type of command and several leaves that hold data of the individual particular clause. The Analyzer/Analyser checks the semantics of an input query and generates a query tree.

Rewriter

The Rewriter is responsible for transforming a query tree according to the rules stored in the pg_rules system catalog if necessary. The rule system is an interesting system in itself, but we will not go into detail to prevent this post from becoming too long.

Planner

The Planner generates a query plan tree that can be processed by the Executor most effectively. It receives a query tree from the Rewriter and is based on pure cost-based optimization. The plan tree is composed of elements called plan nodes, which are defined in plannodes.h. Each plan node contains information that the Executor requires for processing.

Executor

The Executor executes the query via accessing the tables and indexes in the order created by the plan tree. It uses some memory areas, such as temp_buffers and work_mem, allocated in advance, and creates temporary files if necessary. When accessing tuples, PostgreSQL uses a concurrency control mechanism to maintain consistency and isolation of the running transactions.

In conclusion, the five subsystems of the PostgreSQL backend process work together to execute queries efficiently. While the Parser generates a parse tree, the Analyzer/Analyser carries out a semantic analysis, and the Rewriter transforms the query tree according to the rules stored in the pg_rules system catalog if necessary. The Planner generates a query plan tree based on pure cost-based optimization, and the Executor executes the query by accessing tables and indexes. Together, they form a powerful and efficient backend process for PostgreSQL.