How an RDBMS works #5: PostgreSQL Database Cluster Logical Structure

#postgres #apacheage #rdbms #database

1. Introduction

In our latest blog post, we explored the implementation of a graph database using PostgreSQL. We used the Apache AGE extension, which allows us to transform an SQL database into a graph database by combining SQL with Open Cypher. In this example, we started the PostgreSQL server as a service, using the command sudo service postgresql start. This command is available when installing PostgreSQL from official Linux distribution packages. The script checks if the database cluster exists in the default directory and creates a new one if it does not find. Finally, it starts the service to run the cluster.

In this article, we will introduce the concept of a cluster by presenting the logical structure of the PostgreSQL database cluster. Understanding this structure is fundamental for comprehending its operation, managing it effectively, simultaneously running separate databases, and optimizing performance with this tool.

2. The database cluster logical structure

In PostgreSQL, the database cluster is a collection of databases managed by a server that runs on a single host. This concept may seem confusing, so let's break it down. Let's go back to the example of our scientific article database. This database only stores articles and their information (such as metadata). Suppose we have a second database containing information about the authors of these articles and a third database storing information about the publication vehicles of these articles. We can represent a database as shown in Figure 1.

Figure 1 - Representation of a database that contains scientific papers. Elaborated by author.

Now, we can represent the cluster as a large structure that will store databases as objects, as shown in Figure 2.

Figure 2 - Representation of the PostgreSQL database cluster, that contains a collection of three databases. Elaborated by author.

In practice, if you installed PostgreSQL from the source code, it all starts with the following commands:

initdb
pg_ctl start -l log

The initdb command creates a new database cluster. With pg_ctl, we can manually control the server, passing parameters to start, restart, and stop, among others. When starting the server, it will manage the cluster and run on a single host, with localhost as the default. In other articles, I will present more details about the server. Basically, it is the parent of all processes related to cluster management. As explained in Section 1, if you installed PostgreSQL from Linux distribution packages, the same will happen when running sudo service postgresql start. This command creates a new cluster if there is none and starts the server that manages it.

3. Conclusion

In this article, we reviewed the example of the scientific article database and learned the concept of the PostgreSQL database cluster and its logical structure. We also learned the commands that create the cluster and start the server that manages it. In the next post, I will present the concepts related to the cluster's physical structure, where we can delve deeper into the practical part of its operation.

4. Errata

My intention is to provide access to technology information through reliable sources. If you have found any incorrect information, please let your contribution in the comments below 😊.

5. Related Articles

How an RDBMS works #4: Creating a citation graph with PostgreSQL + Apache AGE

6. References

SUZUKI, Hironobu. The Internals of PostgreSQL for database administrators and system developers. Interdb. 2015. Available at https://www.interdb.jp/pg/. Accessed on 05/12/2023.

The PostgreSQL Global Development Group. PostgreSQL 13.11 Documentation, Chapter VI. Reference, PostgreSQL Server Applications, initdb. Available at https://www.postgresql.org/docs/13/app-initdb.html. Accessed on 05/12/2023.