DEV Community

Cover image for How Does Partitioning Work in PostgreSQL and Why Should You Care?
Adam Furmanek for Metis

Posted on • Originally published at metisdata.io

How Does Partitioning Work in PostgreSQL and Why Should You Care?

Partitions let us split the logically one large table into smaller physical ones. This can improve the query’s performance by accessing fewer rows, optimizing random-access reads, and using very targeted indexes. Let’s see how it works in PostgreSQL and how Metis handles that.

Patterns

Partitions are a way to divide a large table into smaller, more manageable pieces called partitions. Each partition holds a subset of the table's data based on a specified partition key. This is only a change from the engine perspective. Nothing changes from the end-user perspective, there is still one table that the user can access regardless of partitioning.

The partitioning key could be a range of values (range partitioning), a list of specific values (list partitioning), or a mathematical expression (hash partitioning). One of the typical examples is partitioning based on date or country. We can also build custom partitioning with views, as we’ll see later in this post.

Partitions help improve query performance by allowing the database to eliminate unnecessary data during query execution. When a query specifies a condition that matches the partition key, PostgreSQL can directly access only the relevant partition, rather than scanning the entire table. This can significantly speed up queries on large tables.

Furthermore, partitions can be spread across multiple storage devices or tablespaces, enabling parallelism and efficient disk utilization. They also facilitate data management tasks such as archiving or deleting old data by simply dropping or detaching partitions.

PostgreSQL provides different types of partitioning methods, including range partitioning, list partitioning, and hash partitioning. Each method has its benefits and is suitable for different use cases.

PostgreSQL offers built-in support for the following forms of partitioning:

  • Range Partitioning: The table is partitioned into “ranges” defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example, one might partition by date ranges, or by ranges of identifiers for particular business objects. Each range's bounds are understood as being inclusive at the lower end and exclusive at the upper end. For example, if one partition's range is from 1 to 10, and the next one's range is from 10 to 20, then value 10 belongs to the second partition not the first.
  • List Partitioning: The table is partitioned by explicitly listing which key values appear in each partition.
  • Hash Partitioning: The table is partitioned by specifying a modulus and a remainder for each partition. Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.

We can also build partitions using views. We create separate tables manually and then create a view that joins the data from all the tables.

Let’s see practical examples.

Partitions based on the date

Let’s start by creating a partitioned table:

CREATE TABLE orders (
   order_id SERIAL,
   order_date DATE NOT NULL,
   customer_name VARCHAR(255),
   product_name VARCHAR(255),
   quantity INT
) PARTITION BY RANGE(EXTRACT(YEAR FROM order_date));
Enter fullscreen mode Exit fullscreen mode

The important part is the last line:

PARTITION BY RANGE(EXTRACT(YEAR FROM order_date))
Enter fullscreen mode Exit fullscreen mode

We create a table that partitions the data based on the year of the order date. Once we have that, we need to explicitly create tables for specific values:

CREATE TABLE orders_2019 PARTITION OF orders FOR VALUES FROM (2019) TO (2020);
CREATE TABLE orders_2020 PARTITION OF orders FOR VALUES FROM (2020) TO (2021);
CREATE TABLE orders_2021 PARTITION OF orders FOR VALUES FROM (2021) TO (2022);
Enter fullscreen mode Exit fullscreen mode

We created three partitions for three different years. You can see that we included the year in the tables’ names. This is a typical approach that increases maintenance.

Let’s now add some data:

INSERT INTO orders (order_date, customer_name, product_name, quantity)
VALUES ('2019-01-01', 'John Smith', 'Product A', 10),
    ('2019-02-15', 'Jane Doe', 'Product B', 5),
    ('2019-04-20', 'Bob Johnson', 'Product C', 2),
    ('2019-07-10', 'Alice Brown', 'Product A', 7),
    ('2019-12-30', 'Mike Wilson', 'Product B', 8);



INSERT INTO orders (order_date, customer_name, product_name, quantity)
VALUES ('2020-02-14', 'John Smith', 'Product A', 15),
    ('2020-03-20', 'Jane Doe', 'Product B', 3),
    ('2020-06-05', 'Bob Johnson', 'Product C', 10),
    ('2020-08-15', 'Alice Brown', 'Product A', 5),
    ('2020-11-25', 'Mike Wilson', 'Product B', 2);


INSERT INTO orders (order_date, customer_name, product_name, quantity)
VALUES ('2021-01-07', 'John Smith', 'Product A', 4),
    ('2021-03-15', 'Jane Doe', 'Product B', 12),
    ('2021-05-20', 'Bob Johnson', 'Product C', 6),
    ('2021-09-01', 'Alice Brown', 'Product A', 3),
Enter fullscreen mode Exit fullscreen mode

We can now query all the orders:

SELECT COUNT(*) FROM orders;
Enter fullscreen mode Exit fullscreen mode

Result:

+-------+
| count |
+-------+
|    15 |
+-------+
Enter fullscreen mode Exit fullscreen mode

Metis shows that the following tables were read:

Image description

We can see all three partitions were accessed.

We can also take orders from a specific table, like this:

SELECT COUNT(*) FROM orders_2019;
Enter fullscreen mode Exit fullscreen mode

Result:

+-------+
| count |
+-------+
|     5 |
+-------+
Enter fullscreen mode Exit fullscreen mode

Metis shows the following analysis:

Image description

We can see only one table has been read. However, if we try the following query:

SELECT COUNT(*) FROM orders
WHERE order_date BETWEEN '2020-01-01' AND '2020-12-31';
Enter fullscreen mode Exit fullscreen mode

with the following result:

+-------+
| count |
+-------+
|     5 |
+-------+
Enter fullscreen mode Exit fullscreen mode

we get the following execution:

Image description

The reason is that the engine doesn’t recognize our filters as the ones for the partitioning. We need to change the query to the following:

SELECT COUNT(*) FROM orders
WHERE order_date BETWEEN '2020-01-01' AND '2020-12-31'
AND EXTRACT(YEAR FROM order_date) = 2020
Enter fullscreen mode Exit fullscreen mode

Metis shows the following:

Image description

We can see that partitions can improve the query performance, but we need to be careful when using them. Metis can help us troubleshoot and optimize database queries easily in this case.

You can also examine the partitions. This query shows the total number of partitions and rows:

SELECT
        pg_inherits.inhparent::regclass AS table_name,
        pg_class.relkind,
        COUNT(pg_inherits.inhrelid::regclass) AS count_partitions,
        SUM(pg_class.relpages) as total_pages,
        SUM(pg_class.reltuples) as total_rows
FROM pg_inherits
JOIN pg_class ON pg_inherits.inhrelid = pg_class.oid
WHERE pg_class.relkind = 'r'
GROUP BY pg_inherits.inhparent::regclass, pg_class.relkind
Enter fullscreen mode Exit fullscreen mode

Output:

+-------------+----------+-------------------+--------------+------------+
| table_name  | relkind  | count_partitions  | total_pages  | total_rows |
+-------------+----------+-------------------+--------------+------------+
| orders      | r        |                3  |           0  |       -3.0 |
+-------------+----------+-------------------+--------------+------------+
Enter fullscreen mode Exit fullscreen mode

This query shows each partition with the partitioning expression:

SELECT
        pg_inherits.inhparent::regclass AS table_name,
        pg_inherits.inhrelid::regclass AS partition_name,
        pg_class.oid,
        pg_class.relpages,
        pg_class.reltuples,
        pg_get_expr(pg_class.relpartbound, pg_class.oid, true) as partition_expression
FROM pg_inherits
JOIN pg_class ON pg_inherits.inhrelid = pg_class.oid
WHERE pg_class.relkind = 'r'
ORDER BY pg_inherits.inhparent, pg_inherits.inhrelid
Enter fullscreen mode Exit fullscreen mode

Output:

+-------------+-----------------+--------+-----------+------------+--------------------------------------+
| table_name  | partition_name  |  oid   | relpages  | reltuples  |         partition_expression         |
+-------------+-----------------+--------+-----------+------------+--------------------------------------+
| orders      | orders_2019     | 16995  |        0  |      -1.0  | FOR VALUES FROM ('2019') TO ('2020') |
| orders      | orders_2020     | 17001  |        0  |      -1.0  | FOR VALUES FROM ('2020') TO ('2021') |
| orders      | orders_2019     | 17007  |        0  |      -1.0  | FOR VALUES FROM ('2021') TO ('2022') |
+-------------+-----------------+--------+-----------+------------+--------------------------------------+
Enter fullscreen mode Exit fullscreen mode

Partitions implemented with views

Let’s now recreate the previous example with views. We create the following tables:

CREATE TABLE orders_2019 (
   order_id SERIAL,
   order_date DATE NOT NULL,
   customer_name VARCHAR(255),
   product_name VARCHAR(255),
   quantity INT
);

CREATE TABLE orders_2020 (
   order_id SERIAL,
   order_date DATE NOT NULL,
   customer_name VARCHAR(255),
   product_name VARCHAR(255),
   quantity INT
);

CREATE TABLE orders_2021 (
   order_id SERIAL,
   order_date DATE NOT NULL,
   customer_name VARCHAR(255),
   product_name VARCHAR(255),
   quantity INT
);
Enter fullscreen mode Exit fullscreen mode

Let’s now create a view that will include data from all these tables:

CREATE VIEW orders AS (
        SELECT * FROM orders_2019
        UNION ALL
        SELECT * FROM orders_2020
        UNION ALL
        SELECT * FROM orders_2021
);
Enter fullscreen mode Exit fullscreen mode

Let’s insert data the same way as before:

INSERT INTO orders_2019 (order_date, customer_name, product_name, quantity)
VALUES ('2019-01-01', 'John Smith', 'Product A', 10),
    ('2019-02-15', 'Jane Doe', 'Product B', 5),
    ('2019-04-20', 'Bob Johnson', 'Product C', 2),
    ('2019-07-10', 'Alice Brown', 'Product A', 7),
    ('2019-12-30', 'Mike Wilson', 'Product B', 8);

INSERT INTO orders_2020 (order_date, customer_name, product_name, quantity)
VALUES ('2020-02-14', 'John Smith', 'Product A', 15),
    ('2020-03-20', 'Jane Doe', 'Product B', 3),
    ('2020-06-05', 'Bob Johnson', 'Product C', 10),
    ('2020-08-15', 'Alice Brown', 'Product A', 5),
    ('2020-11-25', 'Mike Wilson', 'Product B', 2);

INSERT INTO orders_2021 (order_date, customer_name, product_name, quantity)
VALUES ('2021-01-07', 'John Smith', 'Product A', 4),
    ('2021-03-15', 'Jane Doe', 'Product B', 12),
    ('2021-05-20', 'Bob Johnson', 'Product C', 6),
    ('2021-09-01', 'Alice Brown', 'Product A', 3),
    ('2021-11-30', 'Mike Wilson', 'Product B', 9);
Enter fullscreen mode Exit fullscreen mode

Notice that we didn’t insert the data into the view. We had to manually specify the table for the data.

Let’s now query the table:

SELECT COUNT(*) FROM orders
Enter fullscreen mode Exit fullscreen mode

We get the expected result:

+-------+
| count |
+-------+
|    15 |
+-------+
Enter fullscreen mode Exit fullscreen mode

Metis shows this analysis:

Image description

Let’s now query the subset of the data:

SELECT COUNT(*) FROM orders
WHERE order_date BETWEEN '2020-01-01' AND '2020-12-31'
Enter fullscreen mode Exit fullscreen mode

Result:

+-------+
| count |
+-------+
|     5 |
+-------+
Enter fullscreen mode Exit fullscreen mode

Metis shows the following:

Image description

We see that all the tables were used. However, we cannot configure index on this view to speed things up, because views do not support such an operation:

CREATE INDEX orders_order_date_idx ON orders(order_date)
Enter fullscreen mode Exit fullscreen mode

Result:

[Code: 0, SQL State: 42809]  ERROR: cannot create index on relation "orders"
  Detail: This operation is not supported for views.
Enter fullscreen mode Exit fullscreen mode

Therefore, we should use built-in partitioning where possible.

Summary

Partitions can improve the query performance and should be in our toolbox for database optimizations. Metis can help troubleshoot performance issues and show slow queries. There are multiple configuration options for partitions, and we can always build views that mimic the partitioning behavior.

Top comments (0)