DEV Community

Cover image for The Internals of PostgreSQL: Chapter 6: Vacuum Processing
Hasnain Somani
Hasnain Somani

Posted on

The Internals of PostgreSQL: Chapter 6: Vacuum Processing

This post summarizes Chapter 6 of the book "The Internals of PostgreSQL".
Vaccum processing is a maintenance process that is used in PostgreSQL to remove dead tuples, and freeze transaction ids to ensure an efficient operation in the database. Vaccum processing can further be categorized into 2 branches:

  1. Concurrent Vaccum: it removes dead tuples for each page of the table, while allowing other transactions to read the table.
  2. Full Vaccum: removes dead tuples and defragments live tuples for the whole file. Other transactions can not access the tables during this process.

Vaccum processing scans entire tables, and therefore it is a costly operation. This had to be executed manually until version 8.0, and improvements have been made to this over the years - including Visibility Map in version 8.4 (improves the efficiency in removal of dead tuples - reduces the cost of vaccum processing), and enhancements made to the freeze process in version 9.6.

Vaccum processing involves 3 blocks:

  1. The first block:
  2. performs freeze processing.
  3. removes index tuples pointing to dead tuples.

  4. The second block:

  5. removes dead tuples.

  6. updates Free Space Map (FSM) page by page.

  7. updates Visibility Map (VM) page by page.

  8. The third block:

  9. cleans up indexes.

  10. truncates the last page (if possible).

  11. updates statistics and system catalogs.

Statistics and system catalogs related to vaccum processing are updated once vaccum processing is complete, and unnecessary parts of the clog are also removed.
In v13, the PARALLEL option was introduced, which allows the vaccum to process vaccuming, and cleaning indexes in parallel if multiple indexes are created. This operation is available in VACCUM command - not in autovaccum.

The Visibility Map (VM) is an individual map linked with each table, indicating the visibility of each page. The use of VM allows vaccum processing to skip pages that do not contain dead tuples - improving efficiency as a result.

Freeze processing has 2 parts: lazy mode and eager mode. In lazy mode, only those pages having dead tuples are scanned with the use of VM. In eager mode, all pages are scanned, and relevant system catalogs are updated. Although, eager mode is trigerred only when specific conditions are satisfied. One such condition is the freeze threshold of the database exceeding.

The clog (commit log) stores transaction states, and when pg_database.datfrozenxid value is updated, unnecessary files can be removed from the clog. If the minimum pg_database.datfrozenxid exists in a clog file, older files can be removed since all transactions stored in those files are considered to be frozen.

Autovaccum Daemon is used to automate the vaccum process in PostgreSQL. Although the default is to wake every 1 minute and invoke 3 workers, it periodically invokes multiple autovaccum worker processes.

Concurrent vaccum is not sufficient to reduce the table size, and therefore PostgreSQL provides Full VACCUM mode to help with this. Full VACCUM involves creating a new table, and copying all live tuples from the old table to it, removing the old file. It also updates statistis, Free Space Map, and Visibility Map. During Full VACCUM, temporary disk space usage can be up to twice the table size, and access to the table is prohibited.

As a summary, this chapter focuses on maintenance and optimization of the database. Vaccumming, analyzing and monitoring of the database performance have been covered, as well as the importance of vaccuming to remove dead tuples has been highlighted.

Top comments (0)