DEV Community

Cover image for Summary of Chapter# 6 : "Vacuum Processing" from the book "The Internals of PostgreSQL"
Vinay Kumar Talreja
Vinay Kumar Talreja

Posted on • Updated on

Summary of Chapter# 6 : "Vacuum Processing" from the book "The Internals of PostgreSQL"

This blog aims to assist you in understanding the concepts of Chapter:6 [Vacuum Processing] from the book The Internals of PostgreSQL.

Note: Ensure that you have a thorough understanding of
Chapter 5 Part-1, Chapter 5 Part-2 and basics of PostreSQL before we proceed to Chapter 6, as it forms the foundation for our exploration.

So, Let's Start:

Introduction to Chapter

  • Vacuum processing is a maintenance process that facilitates the persistent operation of PostgreSQL.

  • Two main Tasks are removing dead tuples and the freezing transaction ids.

  • Vacuum processing provides two modes to remove dead tuples
    Concurrent VACUUM and Full VACUUM

  • Concurrent VACUUM removes dead tuples for each page of the table file, and other transactions can read the table while this process is running.

  • Full VACUUM removes dead tuples and defragments live tuples the whole file, and other transactions cannot access tables while Full VACUUM is running.


Concurrent VACUUM

  • Vacuum processing performs the following tasks for specified tables or all tables in the database.

  • Removing dead tuples

  1. Remove dead tuples and defragment live tuples for each page.

  2. Remove index tuples that point to dead tuples.

  • Freezing old txids
  1. Freeze old txids of tuples if necessary.

  2. Update frozen txid related system catalogs (pg_database and
    pg_class).

  3. Remove unnecessary parts of the clog if possible.

First Block

  • This block performs freeze processing and removes index tuples that point to dead tuples.

  • First, PostgreSQL scans a target table to build a list of dead tuples and freeze old tuples if possible.

  • After scanning, PostgreSQL removes index tuples by referring to the dead tuple list, process called cleanup stage.

Second Block

  • This block removes dead tuples and updates both the FSM and VM on a page-by-page basis.

Removing a dead tuple in PostgreSQL is depicted in figure below:

Image description

Assume that the table contains three pages. We focus on the 0th page (i.e. the first page). This page has three tuples. Tuple_2 is a dead tuple (Above Fig.(1)). In this case, PostgreSQL removes Tuple_2 and reorders the remaining tuples to repair fragmentation, and then updates both the FSM and VM of this page (Above Fig.(2)). PostgreSQL continues this process until the last page.

Third Block

  • The third block performs the cleanup after the deletion of the indexes, and also updates both the statistics and system catalogs related to vacuum processing for each target table.

Post-processing

  • When vacuum processing is complete, PostgreSQL updates both several statistics and system catalogs related to vacuum processing, and it removes unnecessary parts of the clog if possible.

Visibility Map

  • Vacuum processing is expensive, so a feature called VM (Visibility Map) was introduced in version 8.4 to reduce the cost.

  • The VM is associated with each table and contains information about the visibility of pages in the table file.

  • The visibility of pages determines if there are any dead tuples (rows) on each page.

  • The VM allows vacuum processing to skip pages that do not have dead tuples, saving time and resources.

  • Each table has an individual visibility map that holds the visibility status of each page.

  • The VM is stored in a file with the vm suffix and is composed of one or more 8 KB pages.

  • The VM file is associated with the corresponding table file and can be identified by the suffix _vm.

  • This optimization technique helps improve the efficiency of vacuuming and reduces the cost associated with the process.

How the VM is used in PostgreSQL is depicted in figure below:

Image description


Freeze Processing

  • Freeze processing has two modes, these modes are referred to as lazy mode and eager mode.

  • In lazy mode, freeze processing scans only pages that contain dead tuples using the respective VM of the target tables.

Freezing tuples in lazy mode in PostgreSQL is depicted in figure below:

Image description

  • Eager mode, scans all pages regardless of whether each page contains dead tuples or not, and it also updates system catalogs related to freeze processing and removes unnecessary parts of the clog if possible.

** Freezing old tuples in eager mode (version 9.5 or earlier)** in PostgreSQL is depicted in figure below:

Image description


Removing Unnecessary Clog Files

  • The clog stores transaction states. When pg_database.datfrozenxid is updated, PostgreSQL attempts to remove unnecessary clog files. Note that corresponding clog pages are also removed.

  • If the minimum pg_database.datfrozenxid is contained in the clog file '0002', the older files ('0000' and '0001') can be removed because all transactions stored in those files can be treated as frozen txids in the whole database cluster.

Removing unnecessary clog files and pages in PostgreSQL is depicted in figure below:

Image description

Image description


Autovacuum Daemon

  • Vacuum processing has been automated with the autovacuum daemon; thus, the operation of PostgreSQL has become extremely easy.

  • The autovacuum daemon periodically invokes several autovacuum_worker processes. By default, it wakes every 1 min (defined by autovacuum_naptime), and invokes three workers (defined by autovacuum_max_works).

  • The autovacuum workers invoked by the autovacuum perform vacuum processing concurrently for respective tables gradually with minimal impact on database activity.


Full VACUUM

  • Although Concurrent VACUUM is essential for operation, it is not sufficient. For example, it cannot reduce table size even if many dead tuples are removed.

An example showing the disadvantages of (concurrent) VACUUM in PostgreSQL is depicted in figure below:

Image description

  • Above Figure shows an extreme example. Suppose that a table consists of three pages, and each page contains six tuples. The following DELETE command is executed to remove tuples, and the VACUUM command is executed to remove dead tuples:

  • The dead tuples are removed; however, the table size is not reduced. This is both a waste of d*isk space and has a negative impact on database performance*. For instance, in the above example, when three tuples in the table are read, three pages must be loaded from disk.

  • To deal with this situation, PostgreSQL provides the Full VACUUM mode.

Outline of Full VACUUM mode in PostgreSQL is depicted in figure below:

Image description

Pseudocode: Full VACUUM in PostgreSQL is depicted in figure below:

Image description


I hope, this blog has helped you in understanding the concepts of Vacuum Processing in PostreSQL.

Check out summary of Chapter : 7 Part-1

If you want to understand PostgreSQL In-Depth.

Top comments (0)