PostgreSQL is an open-source relational database management system, provide efficient capabilities and globally favored by organizations and individuals. To begin with PostgreSQL, it is essential to understand the conceptual and physical composition of a PostgreSQL database cluster.
In PostgreSQL, a database cluster refers to a collection of databases rather than a group of servers. Each PostgreSQL server operates on a single host and handles a single database cluster. Within a cluster, a database consists of a set of database objects. According to the principles of relational database theory, a database object is a data structure used for storing or referencing data. Examples of these objects include tables, indexes, sequences, views, functions, and more. PostgreSQL treats databases themselves as separate database objects, while other objects such as indexes and tables belong to their respective databases. Internally, these database objects are managed by object identifiers (OIDs), which are 4-byte integers.
The relationships between database objects and their corresponding OIDs are stored in dedicated system catalogs, which differ depending on the object type. For example, the OIDs of databases are stored in the "pg_database" catalog, while the OIDs of heap tables are stored in the "pg_class" catalog. To retrieve the desired OIDs, you can employ SQL queries to interrogate the relevant catalogs.
When you initialize a new PostgreSQL database cluster using the initdb utility, it establishes a base directory that forms the core of the cluster. This base directory contains multiple subdirectories and various files. Each database within the cluster is represented by a subdirectory within the base directory. Furthermore, every table and index associated with a database is stored as one or more files within the corresponding subdirectory.
PostgreSQL also provides support for tablespaces, which are directories used to store data outside of the base directory. Tablespaces are a unique feature specific to PostgreSQL and differ from tablespaces in other relational database management systems (RDBMS). They allow for the flexibility of storing data in separate locations apart from the base directory.
The layout of the PostgreSQL database cluster is described in the official documentation. Main files and subdirectories are listed under the base directory, including pg_xlog, pg_wal, pg_twophase, and pg_multixact.
A database is a subdirectory under the base directory, and the database directory names are identical to the respective OIDs. For example, when the OID of the database sampledb is 16384, its subdirectory name is 16384.
In PostgreSQL, tables and indexes are stored as individual files within the database directory they belong to, as long as their size is below 1GB. These files are internally managed using unique Object IDs (OIDs). The actual data files, on the other hand, are controlled by a variable known as "relfilenode." In most cases, the relfilenode values of tables and indexes correspond to their respective OIDs, establishing a fundamental association between them. This relationship ensures a basic mapping between the two entities.
In PostgreSQL, data files such as heap tables, indexes, as well as the free space map and visibility map, are divided into fixed-length pages or blocks. The default page size in PostgreSQL is 8 KB, and each page is assigned a unique block number starting from 0, increasing sequentially.
When a data file becomes full, PostgreSQL adds a new empty page at the end of the file to accommodate additional data and increase its size.
The internal structure of a page within a table consists of three main components: heap tuples, line pointers, and header data. Heap tuples represent the actual record data and are stored sequentially from the bottom of the page. Line pointers, which are 4 bytes in size, act as pointers to each heap tuple. They form an array serving as an index to the tuples. Each line pointer is assigned a unique offset number starting from 1 to reference its corresponding tuple. When a new tuple is added, a new line pointer is appended to the array to point to the newly added tuple.
This page layout ensures efficient storage and retrieval of data within a PostgreSQL data file.
The logical and physical structures of a PostgreSQL database cluster and the internal layout of heap table files are essential to understand for developers. This knowledge can help the developers to work with and manage PostgreSQL database in a better way.
For in depth-analysis, please go through the below link.
Chapter-1 The Internals of PostgreSQL