Partitions let us split the logically one large table into smaller physical ones. This can improve the query’s performance by accessing fewer rows, optimizing random-access reads, and using very targeted indexes. Let’s see how it works in PostgreSQL and how Metis handles that.
Patterns
Partitions are a way to divide a large table into smaller, more manageable pieces called partitions. Each partition holds a subset of the table's data based on a specified partition key. This is only a change from the engine perspective. Nothing changes from the end-user perspective, there is still one table that the user can access regardless of partitioning.
The partitioning key could be a range of values (range partitioning), a list of specific values (list partitioning), or a mathematical expression (hash partitioning). One of the typical examples is partitioning based on date or country. We can also build custom partitioning with views, as we’ll see later in this post.
Partitions help improve query performance by allowing the database to eliminate unnecessary data during query execution. When a query specifies a condition that matches the partition key, PostgreSQL can directly access only the relevant partition, rather than scanning the entire table. This can significantly speed up queries on large tables.
Furthermore, partitions can be spread across multiple storage devices or tablespaces, enabling parallelism and efficient disk utilization. They also facilitate data management tasks such as archiving or deleting old data by simply dropping or detaching partitions.
PostgreSQL provides different types of partitioning methods, including range partitioning, list partitioning, and hash partitioning. Each method has its benefits and is suitable for different use cases.
PostgreSQL offers built-in support for the following forms of partitioning:
- Range Partitioning: The table is partitioned into “ranges” defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example, one might partition by date ranges, or by ranges of identifiers for particular business objects. Each range's bounds are understood as being inclusive at the lower end and exclusive at the upper end. For example, if one partition's range is from 1 to 10, and the next one's range is from 10 to 20, then value 10 belongs to the second partition not the first.
- List Partitioning: The table is partitioned by explicitly listing which key values appear in each partition.
- Hash Partitioning: The table is partitioned by specifying a modulus and a remainder for each partition. Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.
We can also build partitions using views. We create separate tables manually and then create a view that joins the data from all the tables.
Let’s see practical examples.
Partitions based on the date
Let’s start by creating a partitioned table:
CREATE TABLE orders (
order_id SERIAL,
order_date DATE NOT NULL,
customer_name VARCHAR(255),
product_name VARCHAR(255),
quantity INT
) PARTITION BY RANGE(EXTRACT(YEAR FROM order_date));
The important part is the last line:
PARTITION BY RANGE(EXTRACT(YEAR FROM order_date))
We create a table that partitions the data based on the year of the order date. Once we have that, we need to explicitly create tables for specific values:
CREATE TABLE orders_2019 PARTITION OF orders FOR VALUES FROM (2019) TO (2020);
CREATE TABLE orders_2020 PARTITION OF orders FOR VALUES FROM (2020) TO (2021);
CREATE TABLE orders_2021 PARTITION OF orders FOR VALUES FROM (2021) TO (2022);
We created three partitions for three different years. You can see that we included the year in the tables’ names. This is a typical approach that increases maintenance.
Let’s now add some data:
INSERT INTO orders (order_date, customer_name, product_name, quantity)
VALUES ('2019-01-01', 'John Smith', 'Product A', 10),
('2019-02-15', 'Jane Doe', 'Product B', 5),
('2019-04-20', 'Bob Johnson', 'Product C', 2),
('2019-07-10', 'Alice Brown', 'Product A', 7),
('2019-12-30', 'Mike Wilson', 'Product B', 8);
INSERT INTO orders (order_date, customer_name, product_name, quantity)
VALUES ('2020-02-14', 'John Smith', 'Product A', 15),
('2020-03-20', 'Jane Doe', 'Product B', 3),
('2020-06-05', 'Bob Johnson', 'Product C', 10),
('2020-08-15', 'Alice Brown', 'Product A', 5),
('2020-11-25', 'Mike Wilson', 'Product B', 2);
INSERT INTO orders (order_date, customer_name, product_name, quantity)
VALUES ('2021-01-07', 'John Smith', 'Product A', 4),
('2021-03-15', 'Jane Doe', 'Product B', 12),
('2021-05-20', 'Bob Johnson', 'Product C', 6),
('2021-09-01', 'Alice Brown', 'Product A', 3),
We can now query all the orders:
SELECT COUNT(*) FROM orders;
Result:
+-------+
| count |
+-------+
| 15 |
+-------+
Metis shows that the following tables were read:
We can see all three partitions were accessed.
We can also take orders from a specific table, like this:
SELECT COUNT(*) FROM orders_2019;
Result:
+-------+
| count |
+-------+
| 5 |
+-------+
Metis shows the following analysis:
We can see only one table has been read. However, if we try the following query:
SELECT COUNT(*) FROM orders
WHERE order_date BETWEEN '2020-01-01' AND '2020-12-31';
with the following result:
+-------+
| count |
+-------+
| 5 |
+-------+
we get the following execution:
The reason is that the engine doesn’t recognize our filters as the ones for the partitioning. We need to change the query to the following:
SELECT COUNT(*) FROM orders
WHERE order_date BETWEEN '2020-01-01' AND '2020-12-31'
AND EXTRACT(YEAR FROM order_date) = 2020
Metis shows the following:
We can see that partitions can improve the query performance, but we need to be careful when using them. Metis can help us troubleshoot and optimize database queries easily in this case.
You can also examine the partitions. This query shows the total number of partitions and rows:
SELECT
pg_inherits.inhparent::regclass AS table_name,
pg_class.relkind,
COUNT(pg_inherits.inhrelid::regclass) AS count_partitions,
SUM(pg_class.relpages) as total_pages,
SUM(pg_class.reltuples) as total_rows
FROM pg_inherits
JOIN pg_class ON pg_inherits.inhrelid = pg_class.oid
WHERE pg_class.relkind = 'r'
GROUP BY pg_inherits.inhparent::regclass, pg_class.relkind
Output:
+-------------+----------+-------------------+--------------+------------+
| table_name | relkind | count_partitions | total_pages | total_rows |
+-------------+----------+-------------------+--------------+------------+
| orders | r | 3 | 0 | -3.0 |
+-------------+----------+-------------------+--------------+------------+
This query shows each partition with the partitioning expression:
SELECT
pg_inherits.inhparent::regclass AS table_name,
pg_inherits.inhrelid::regclass AS partition_name,
pg_class.oid,
pg_class.relpages,
pg_class.reltuples,
pg_get_expr(pg_class.relpartbound, pg_class.oid, true) as partition_expression
FROM pg_inherits
JOIN pg_class ON pg_inherits.inhrelid = pg_class.oid
WHERE pg_class.relkind = 'r'
ORDER BY pg_inherits.inhparent, pg_inherits.inhrelid
Output:
+-------------+-----------------+--------+-----------+------------+--------------------------------------+
| table_name | partition_name | oid | relpages | reltuples | partition_expression |
+-------------+-----------------+--------+-----------+------------+--------------------------------------+
| orders | orders_2019 | 16995 | 0 | -1.0 | FOR VALUES FROM ('2019') TO ('2020') |
| orders | orders_2020 | 17001 | 0 | -1.0 | FOR VALUES FROM ('2020') TO ('2021') |
| orders | orders_2019 | 17007 | 0 | -1.0 | FOR VALUES FROM ('2021') TO ('2022') |
+-------------+-----------------+--------+-----------+------------+--------------------------------------+
Partitions implemented with views
Let’s now recreate the previous example with views. We create the following tables:
CREATE TABLE orders_2019 (
order_id SERIAL,
order_date DATE NOT NULL,
customer_name VARCHAR(255),
product_name VARCHAR(255),
quantity INT
);
CREATE TABLE orders_2020 (
order_id SERIAL,
order_date DATE NOT NULL,
customer_name VARCHAR(255),
product_name VARCHAR(255),
quantity INT
);
CREATE TABLE orders_2021 (
order_id SERIAL,
order_date DATE NOT NULL,
customer_name VARCHAR(255),
product_name VARCHAR(255),
quantity INT
);
Let’s now create a view that will include data from all these tables:
CREATE VIEW orders AS (
SELECT * FROM orders_2019
UNION ALL
SELECT * FROM orders_2020
UNION ALL
SELECT * FROM orders_2021
);
Let’s insert data the same way as before:
INSERT INTO orders_2019 (order_date, customer_name, product_name, quantity)
VALUES ('2019-01-01', 'John Smith', 'Product A', 10),
('2019-02-15', 'Jane Doe', 'Product B', 5),
('2019-04-20', 'Bob Johnson', 'Product C', 2),
('2019-07-10', 'Alice Brown', 'Product A', 7),
('2019-12-30', 'Mike Wilson', 'Product B', 8);
INSERT INTO orders_2020 (order_date, customer_name, product_name, quantity)
VALUES ('2020-02-14', 'John Smith', 'Product A', 15),
('2020-03-20', 'Jane Doe', 'Product B', 3),
('2020-06-05', 'Bob Johnson', 'Product C', 10),
('2020-08-15', 'Alice Brown', 'Product A', 5),
('2020-11-25', 'Mike Wilson', 'Product B', 2);
INSERT INTO orders_2021 (order_date, customer_name, product_name, quantity)
VALUES ('2021-01-07', 'John Smith', 'Product A', 4),
('2021-03-15', 'Jane Doe', 'Product B', 12),
('2021-05-20', 'Bob Johnson', 'Product C', 6),
('2021-09-01', 'Alice Brown', 'Product A', 3),
('2021-11-30', 'Mike Wilson', 'Product B', 9);
Notice that we didn’t insert the data into the view. We had to manually specify the table for the data.
Let’s now query the table:
SELECT COUNT(*) FROM orders
We get the expected result:
+-------+
| count |
+-------+
| 15 |
+-------+
Metis shows this analysis:
Let’s now query the subset of the data:
SELECT COUNT(*) FROM orders
WHERE order_date BETWEEN '2020-01-01' AND '2020-12-31'
Result:
+-------+
| count |
+-------+
| 5 |
+-------+
Metis shows the following:
We see that all the tables were used. However, we cannot configure index on this view to speed things up, because views do not support such an operation:
CREATE INDEX orders_order_date_idx ON orders(order_date)
Result:
[Code: 0, SQL State: 42809] ERROR: cannot create index on relation "orders"
Detail: This operation is not supported for views.
Therefore, we should use built-in partitioning where possible.
Summary
Partitions can improve the query performance and should be in our toolbox for database optimizations. Metis can help troubleshoot performance issues and show slow queries. There are multiple configuration options for partitions, and we can always build views that mimic the partitioning behavior.
Top comments (0)