DEV Community

Cover image for The Internals of PostgreSQL: Chapter 1: Database Cluster, Databases, and Tables
Hasnain Somani
Hasnain Somani

Posted on • Edited on

The Internals of PostgreSQL: Chapter 1: Database Cluster, Databases, and Tables

This post summarizes Chapter 1 of the book "The Internals of PostgreSQL".

PostgreSQL database's logical structure consists of a collection of databases, all managed by a single Postgre server, which runs on a host. Each database existing on the cluster is treated as a separate entity, and therefore comprises of various database objects such as indexes, sequences, views, functions, and tables.

The relations that hold these objects together, and the corresponding Object IDs are stored in catalogues present in the database (pg_database to store OIDs of databases and pg_class catalog stores OIDs of heap tables.) Retrieval of the OIDs of objects is done through querying these catalogs.

A database cluster has a directory hierarchy physical structure: The base directory acts as the main directory, and it comprises of other subdirectories and files. The path for the base directory is set to PGDATA. Creation of a new database cluster can be done by using the initdb command.

Some of the important files and subdirectories include:

PG_VERSION: A file containing the major version number of PostgreSQL.
pg_hba.conf: A file used to control client authentication for PostgreSQL.
pg_ident.conf: A file used to map PostgreSQL user names.
postgresql.conf: A file used to configure various parameters for PostgreSQL.
postgresql.auto.conf: A file used to store configuration parameters set in ALTER SYSTEM (version 9.4 or later).
base/: A subdirectory containing per-database subdirectories.
global/: A subdirectory containing cluster-wide tables like pg_database and pg_control.
pg_wal/ (Version 10 or later): A subdirectory containing Write Ahead Logging (WAL) segment files.
pg_tblspc/: A subdirectory containing symbolic links to tablespaces.
Each database within the cluster has its own subdirectory under the base directory, with the subdirectory name corresponding to the OID of the database.

Tables and indexes are stored as separate individual files, and they are stored within their respective subdirectories. The names of these files correspond to the value of refilenode, which is usually same as the OID of the table, except for those tables and indexes who's size increases to above 1GB. After the size exceeds 1GB, a new refilenode is created, and its name is given a new suffix like ".1", and ".2" proceeds after that, and so on.

Additional files (_fsm) and (_vm) files also exist within the databse subdirectories. These are the free space map files, and visibility map files respectively. _fsm files store information about the free space capacity whereas the _vm files store information about the page visibility within the table.

The idea of tablespaces is also supported in PostgreSQL, which contains additional data, outside the base directory. Tablespaces have their own subdirectories, and they are themselves created under a specified directory.

A heap table file consists of pages of fixed length. The default length is 8192 bytes, and each page contains record data (tuples), Line pointers (point to tuples), and header data. The header data further has information about the checksum value, Log Sequence Number, and other details related to the page.

As a conclusion, the chapter discusses an overview on the logical and physical structure of a postgreSQL database cluster, and how databases, objects, files and subdirectories are organized and managed.

Top comments (0)