Just wanted to drop a quick-ish post on how I've been handling creating a new partition that might have overlapping data in the default partition. We have adopted using a default partition in order to capture as much data as possible during processing without throwing exceptions when a defined partition was not created.
In my team's project, we are using partitioned tables using a date range for the table partitions. We've adopted a naming convention of
<table_name>_default for the default partition and
<table_name>_YYYY_MM for the monthly partitions. We process a lot of data, so we don't want to do more data manipulation than we have to. Also, this project is a web app, so we have to deal with a lot of concurrent database requests. So, using a lot of temp tablespace was not desirable either. So, here's a breakdown of how I'm handling default partition collisions when creating a new partition.
First: Get all of the information needed to create the partition. This includes the table partition name, the partitioned table name, the partition key column, the partition parameters (in our case, it's
default = False,
from = <partition-start-date>,
to = <partition-end-date>).
Second: Check if any partition key col value within the requested partition range exists in the default partition. If data does exist (which would cause a PostgreSQL error), I alter the
to parameters adding an arbitrary year delta (such as 100 years) to each of the boundaries. Then I create the partition (using the original name) for the bounds that will not match. Don't forget to save the original values.
Third: Now that I have a partition created, I detach the partition from the partitioned table. Then I copy the overlapping data from the default partition to the detached partition. Once that is complete, I delete that data from the default. I use the original values for the
to parameters to reattach the partition to the partitioned table.
This method allows me to move the data without creating a potentially big temp table (temp tablespace) or a regular table in the app tablespace that will be deleted later. It also means that I can let PostgreSQL create the table as it sees fit for a partition. Since a partition can be detached and later re-attached with different constraints, this was a feasible option for our application.
Obviously, this will impact performance if it occurs often or with a large amount of data, but the answer to that is to handle potential edge cases in the application code that would see data hit the default partition.
Hopefully this methodology could prove viable to other developers and/or architects.