Understanding internals of PostgreSQL - Chapter 01

PostgreSQL, commonly referred to as Postgres, is a popular and powerful open-source relational database management system that has been around since 1996. It is known for its robustness, scalability, and extensibility, and is used by many organizations and developers around the world for a wide range of applications.

As someone who is interested in learning more about Postgres, I recently delved into the first chapter of the Internals of Postgres - a comprehensive guide to the inner workings of this database system. The Internals of Postgres, authored by the renowned Postgres expert Hironobu SUZUKI, provides a detailed look at the design and implementation of Postgres, and is a valuable resource for anyone looking to gain a deeper understanding of this database system.

In this post, I will share a summary of my learnings from Chapter 01 of the Internals of Postgres. This chapter focuses on the basic knowledge of how Postgres stores database, providing a foundational understanding of how the database works at a low level. I hope this post will be a helpful guide for others who are looking to gain a deeper understanding of Postgres and its internals. My learnings are based on the source provided by Hironobu SUZUKI on The Internals of PostgreSQL.

Logical Structure

PostgreSQL manages a collection of databases known as a database cluster that is managed by a single Postgres server. Each database consists of database objects, which are used to store or reference data, such as tables, indexes, views, functions, and more. All database objects are managed by OIDs (Object Identifiers), which are unsigned 4-byte integers, and the relation between DB objects and OIDs is stored in the "system catalog." Databases themselves are also database objects, logically separated from each other.

Physical Structure

The physical structure of a Postgres database cluster involves a base directory, subdirectories for each database, and at least one file (tables or indexes) stored under the database subdirectory, along with configuration files. The layout of the database cluster includes different files and their descriptions.

PostgreSQL also supports tablespaces, which is a directory that contains some data outside of the base directory, created under the directory specified when you issue the CREATE TABLESPACE statement.

Internal Layout of Heap Table File

The internal layout of a heap table file is also an essential concept covered in Chapter 01 of the Internals of Postgres. The data file is divided into pages or blocks of a fixed length, with the pages numbered sequentially from 0. A page within a table contains heap tuples, line/item pointers, and header data.

Methods of Writing and Reading Tuples

When a tuple is written to a page, it is first compressed and then written to the page. The line pointer to the tuple is then written to the page header.

When reading tuples, PostgreSQL uses two methods: sequential scan and index scan. Sequential scan reads all tuples on all pages in the heap table segment. Index scan reads only those tuples that match the search criteria in the index.

In conclusion, Chapter 01 of the Internals of Postgres is a comprehensive guide to the inner workings of Postgres that provides a detailed look at the architecture and storage management system of the database system. Understanding the logical and physical structure of a Postgres database cluster, how data is read, written, and the arrangement of the heap file is fundamental to a deeper understanding of Postgres. This foundational knowledge will pave the way for further exploration of other topics in the Internals of Postgres.