DEV Community

carl ai
carl ai

Posted on

Best Practices for PostgreSQL Table Partition Managing

What is PostgreSQL Table Partition

In PostgreSQL 10, table partitioning was introduced as a feature that allows you to divide a large table into smaller, more manageable pieces called partitions. Each partition is essentially a separate table that stores a subset of the data from the original table. This can be useful for improving query performance, as well as for managing large datasets more efficiently.

In PostgreSQL 11, several improvements to table partitioning were added, of which the key improvements are the ability to create default partitions, do automatic index creation, support foreign keys and unique indexes, and the ability to create hash partitions. "Interval partitioning" allows you to define a partitioning scheme that automatically creates new partitions as needed based on the values in the partition key. Hash partitioning allows you to divide your data into partitions based on a hash function applied to the partition key, it can be useful for evenly distributing data across partitions, which can help to improve query performance.

And PostgreSQL 12 brought more improvements to table partitioning, make it an even more powerful feature for managing large datasets and improving query performance. Here are some of the key improvements:

  1. Partition elimination: allows to eliminate partitions that don't need to be scanned for a given query. This can significantly improve query performance for tables with many partitions.
  2. Performance for partitioned tables with indexes: make queries on partitioned tables with indexes faster than it was in previous versions.
  3. More partitioning methods: list partitioning and range partitioning with sub-partitions, giving you more flexibility in how you partition your data.
  4. Partitioning foreign keys: make it easier to manage partitioning foreign keys than it was in previous versions.
  5. Partitioning DML statements: make it easier to perform data manipulation language (DML) statements on partitioned tables, including support for cascading updates and deletes across partitions, support for RETURNING clauses in INSERT/UPDATE/DELETE statements.

How to Use PostgreSQL Table Partition

To use table partitioning in PostgreSQL, you first need to define a partitioning scheme for your table. This involves specifying the partition key, which is the column or set of columns that will be used to determine which partition each row belongs to. You can then create the partitions themselves, either manually or using a script.

Here are some examples of real-world schemes for PostgreSQL partition tables, partitioned by RANGE, LIST, and HASH respectively:

  1. Partitioned by RANGE: Suppose you have a table that stores sales data for a large retail chain. You could partition the table by date, using a RANGE partitioning scheme. For example, you could create partitions for each quarter of the year, like this:

    CREATE TABLE sales (
        id BIGINT GENERATED ALWAYS AS IDENTITY,
        date DATE NOT NULL,
        store_id INTEGER NOT NULL,
        amount NUMERIC NOT NULL
    )
    PARTITION BY RANGE (date);
    
    CREATE TABLE sales_q1 PARTITION OF sales
        FOR VALUES FROM ('2022-01-01') TO ('2022-04-01');
    
    CREATE TABLE sales_q2 PARTITION OF sales
        FOR VALUES FROM ('2022-04-01') TO ('2022-07-01');
    
    CREATE TABLE sales_q3 PARTITION OF sales
        FOR VALUES FROM ('2022-07-01') TO ('2022-10-01');
    
    CREATE TABLE sales_q4 PARTITION OF sales
        FOR VALUES FROM ('2022-10-01') TO ('2023-01-01');
    
  2. Partitioned by LIST: Suppose you have a table that stores customer data for an e-commerce site. You could partition the table by country, using a LIST partitioning scheme. For example, you could create partitions for each country where you have customers, like this:

    CREATE TABLE customers (
        id BIGINT GENERATED ALWAYS AS IDENTITY,
        name TEXT NOT NULL,
        email TEXT NOT NULL,
        country TEXT NOT NULL
    )
    PARTITION BY LIST (country);
    
    CREATE TABLE customers_usa PARTITION OF customers
        FOR VALUES IN ('USA');
    
    CREATE TABLE customers_canada PARTITION OF customers
        FOR VALUES IN ('Canada');
    
    CREATE TABLE customers_uk PARTITION OF customers
        FOR VALUES IN ('UK');
    
    CREATE TABLE customers_germany PARTITION OF customers
        FOR VALUES IN ('Germany');
    
  3. Partitioned by HASH: Suppose you have a table that stores sensor data for a large industrial plant. You could partition the table by sensor ID, using a HASH partitioning scheme. For example, you could create partitions for each sensor, like this:

    CREATE TABLE sensor_data (
        id BIGINT GENERATED ALWAYS AS IDENTITY,
        sensor_id INTEGER NOT NULL,
        timestamp TIMESTAMP NOT NULL,
        value NUMERIC NOT NULL
    )
    PARTITION BY HASH (sensor_id);
    
    CREATE TABLE sensor_data_1 PARTITION OF sensor_data
        FOR VALUES WITH (MODULUS 4, REMAINDER 0);
    
    CREATE TABLE sensor_data_2 PARTITION OF sensor_data
        FOR VALUES WITH (MODULUS 4, REMAINDER 1);
    
    CREATE TABLE sensor_data_3 PARTITION OF sensor_data
        FOR VALUES WITH (MODULUS 4, REMAINDER 2);
    
    CREATE TABLE sensor_data_4 PARTITION OF sensor_data
        FOR VALUES WITH (MODULUS 4, REMAINDER 3);
    

Automatic Partition Management

As your data grows and changes over time, your might find manual partition creation and deletion can be complex, time-consuming and error-prone. So let's talk about automatic partition management.

Automatic partition creation and deletion can be very useful when using PostgreSQL partition tables for several reasons:

  1. Simplifies management: With automatic partition creation and deletion, you don't need to manually create and manage partitions as your data grows and changes. This can simplify management and reduce the risk of errors.
  2. Improves performance: Automatic partition creation and deletion can help to improve query performance by ensuring that data is stored in the appropriate partition. This can reduce the amount of data that needs to be scanned for a given query, which can improve query performance.
  3. Reduces storage costs: Automatic partition deletion can help to reduce storage costs by allowing you to easily delete old data that is no longer needed. This can help to free up storage space and reduce costs.
  4. Supports data retention policies: Automatic partition deletion can also help to support data retention policies by allowing you to easily delete data that is no longer needed after a certain period of time.

There are several ways to achieve automatic partition management in PostgreSQL.

Partition pre-creation and deferred removal

Partition pre-creation is the process of creating partitions in advance of when they are needed. It can help to improve performance by reducing the overhead of creating partitions dynamically as data is inserted into a partitioned table. When you pre-create partitions, you create a set of empty partitions in advance, based on a predefined partitioning scheme. These partitions are typically created with no data, and are ready to receive data as it is inserted into the partitioned table.

Deferred removal involves delaying the removal of partitions until a later time. This can be useful in situations where you want to keep data for a certain period of time, but don't want to keep it indefinitely. By deferring the removal of partitions, you can ensure that you have enough time to analyze and process the data before it is deleted.

Here are several ways to implement these two techniques.

*nix cron task

Here's an example of how you could use a *nix cron task to achieve partition pre-creation and deferred removal in PostgreSQL:

  1. Create a partitioned table, called "sales" that is partitioned by range on the "date" column.

    CREATE TABLE sales (
        id BIGINT GENERATED ALWAYS AS IDENTITY,
        date DATE NOT NULL,
        store_id INTEGER NOT NULL,
        amount NUMERIC NOT NULL
    )
    PARTITION BY RANGE (date);
    
  2. Create a partitioning function, called "sales_partition_function" that returns the name of the partition that should be used for a given date. The function takes a date as input and returns the name of the partition based on the year and month of the date.

    CREATE OR REPLACE FUNCTION sales_partition_function(date)
    RETURNS TEXT AS $$
    DECLARE
        partition_name TEXT;
    BEGIN
        partition_name := 'sales_' || to_char($1, 'YYYY_MM');
        RETURN partition_name;
    END;
    $$ LANGUAGE plpgsql;
    
  3. Create a shell script to pre-create partitions. This is a shell script that pre-creates partitions for the current month and removes partitions that are older than one year. The script uses the "psql" command to execute SQL statements in PostgreSQL.

    #!/bin/bash
    
    # Get the next year and month
    YEAR_MONTH=$(date +%Y_%m -d "$(date) +1 month")
    
    # Get range begin and end with date of next month
    ONE_MONTH_LATER=$(date +%Y-%m-%d -d "$(date) +1 month")
    TWO_MONTHS_LATER=$(date +%Y-%m-%d -d "$(date) +2 month")
    
    # Create the partition for the next month
    psql -c "CREATE TABLE IF NOT EXISTS sales_${YEAR_MONTH} PARTITION OF sales FOR VALUES FROM ('$ONE_MONTH_LATER') TO ('$TWO_MONTHS_LATER');"
    
    # Remove partitions that are older than a year
    YEAR=$(date +%Y)
    tbArray=$(psql -tc "\dt sales_$(($YEAR - 1))_*" | awk -F'|' '{print $2}')
    for i in $(echo ${tbArray[@]}); do 
        psql -c "DROP TABLE IF EXISTS $i"; 
    done
    
  4. Add the shell script to a *nix cron task, that runs on the first day of every month. This will ensure that partitions are pre-created and removed on a regular basis.

    0 0 1 * * /path/to/script.sh
    

The pg_cron extension

The pg_cron extension is a PostgreSQL extension that allows you to schedule and run cron-like jobs directly from within the database. It provides a simple and convenient way to schedule tasks such as backups, data exports, and other maintenance tasks.

With pg_cron, you can schedule jobs using a familiar cron-like syntax, and the jobs will be executed directly within the database. This eliminates the need for external scheduling tools and simplifies the management of scheduled tasks.

Here's an example of how you could use pg_cron to schedule the creation and removal of partition tables:

CREATE TABLE sales (
    id BIGINT GENERATED ALWAYS AS IDENTITY,
    date DATE NOT NULL,
    store_id INTEGER NOT NULL,
    amount NUMERIC NOT NULL
)
PARTITION BY RANGE (date);

-- create function to manage partitions
CREATE OR REPLACE
FUNCTION manage_sales_partition() RETURNS integer
    LANGUAGE plpgsql AS
$$
DECLARE
   regPatternTableName varchar;
   targetTableName varchar;
   oneMonthLater varchar;
   twoMonthLater varchar;
   lastValidTableName varchar;
   _tb_name information_schema.tables.table_name%TYPE;
BEGIN
    SELECT 'sales_\d{4}_\d{2}' INTO regPatternTableName;
    -- Get the next year and month
    SELECT to_char(now() + INTERVAL '1 month', 'sales_YYYY_MM') INTO targetTableName;

    -- Get range begin and end with date of next month
    SELECT to_char(now() + INTERVAL '1 month', 'YYYY-MM-DD') INTO oneMonthLater;
    SELECT to_char(now() + INTERVAL '2 month', 'YYYY-MM-DD') INTO twoMonthLater;
    SELECT to_char(now() - INTERVAL '12 month', 'sales_YYYY_MM') INTO lastValidTableName;

    -- Create the partition for the next month
    EXECUTE format('CREATE TABLE IF NOT EXISTS %I PARTITION OF sales FOR VALUES FROM (%L) TO (%L);', targetTableName, oneMonthLater, twoMonthLater);

    -- Remove partitions that are older than a year
    FOR _tb_name IN
        SELECT table_name
        FROM information_schema.TABLES
        WHERE table_name ~ regPatternTableName
        AND table_name <= lastValidTableName
    LOOP
        EXECUTE format('DROP TABLE IF EXISTS %I', _tb_name);
    END LOOP;
    RETURN 1;
END;
$$;

-- create extension pg_cron
CREATE EXTENSION pg_cron;

-- create a pg_cron task and schedule monthly
SELECT cron.schedule('0 0 1 * *', $$SELECT manage_sales_partition();$$);
Enter fullscreen mode Exit fullscreen mode

The pg_partman extension

The pg_partman extension is a PostgreSQL extension that provides advanced partitioning capabilities for large tables. It allows you to manage partitions automatically based on time intervals or other criteria, and provides a set of tools for managing partitioned tables.

With pg_partman, you can create and manage partitioned tables using a variety of partitioning schemes, including range, list, and hash partitioning. You can also use pg_partman to manage partition maintenance tasks such as creating and dropping partitions, and to automate the process of moving data between partitions.

Here are some key features of the pg_partman extension:

  • Automatic partitioning: pg_partman provides automatic partitioning capabilities, allowing you to create and manage partitions based on time intervals or other criteria. This can help to simplify the management of large tables and improve performance.

  • Partitioning schemes: pg_partman supports a variety of partitioning schemes, including range, list, and hash partitioning. This allows you to choose the partitioning scheme that best fits your data and query patterns.

  • Partition maintenance: pg_partman provides a set of tools for managing partition maintenance tasks such as creating and dropping partitions, and for automating the process of moving data between partitions.

  • Query routing: pg_partman supports query routing, allowing you to route queries to the appropriate partition based on the partitioning scheme. This can help to improve query performance and reduce the overhead of managing large tables.

  • Compatibility: pg_partman is compatible with PostgreSQL 9.4 and later, and is available as an open-source extension.

To use pg_partman, you'll need to install the extension and create a partitioned table using the "partman.create_parent()" function. Here's an example of how you could use pg_partman to create a partitioned table based on time intervals:

CREATE TABLE sales (
    id BIGINT GENERATED ALWAYS AS IDENTITY,
    date DATE NOT NULL,
    store_id INTEGER NOT NULL,
    amount NUMERIC NOT NULL
)
PARTITION BY RANGE (date);

-- Install the pg_partman extension
CREATE SCHEMA partman;
CREATE EXTENSION pg_partman WITH SCHEMA partman;

-- Create a partitioned table based on time intervals
SELECT partman.create_parent('public.sales', 'date', 'native', 'monthly', p_premake := 2);
-- Set retention policy
UPDATE partman.part_config  SET retention = '12 month', retention_keep_table=false WHERE parent_table='public.sales';

-- auto trigger maintenance(partition creation and retention) using pg_cron
CREATE EXTENSION pg_cron;
SELECT cron.schedule('0 0 1 * *', $$SELECT partman.run_maintenance();$$);
Enter fullscreen mode Exit fullscreen mode

The pg_pathman extension

The pg_pathman is a Postgres Pro extension that provides an optimized partitioning solution for large and distributed databases. Using pg_pathman, you can:

  • Partition large databases without downtime.
  • Speed up query execution for partitioned tables.
  • Manage existing partitions and add new partitions on the fly.
  • Add foreign tables as partitions.
  • Join partitioned tables for read and write operations.
  • The extension is compatible with Postgres Pro 9.5 or higher.

It is worth noting that, starting from Postgres Pro 12, using pg_pathman is not recommended. And it has been deprecated since 2019.

Partition dynamical creation

Dynamic partition creation can help to simplify management by allowing you to create partitions as needed, without having to pre-create a large number of partitions in advance. This can help to reduce storage costs, as you only create partitions when they are needed, and can help to improve performance by reducing the overhead of managing a large number of partitions.

However, it's important to note that dynamic partition creation can also have some downsides. It can add overhead to the insert process, as PostgreSQL must perform additional work to create the partition and update the partitioning metadata. This overhead can be significant if you are inserting a large amount of data. Additionally, dynamic partition creation can make it more difficult to manage your partitions, as you may need to monitor the partitioning metadata to ensure that you are not creating too many partitions or running out of disk space. So dynamic on-demand creation is not recommended in mass production.

Conclusion

In this article we introduced PostgreSQL partition tables comprehensively, including its definition, common usage accompanied by example codes. I believe you in front of the screen have fully understood how to use it. If there are any questions, please leave a comment in the comment section, and I will do my best to answer them when I see them.

Top comments (0)