Archiving large MySQL tables (part III - automation)

#mysql #database #rails

This post is a Part II in the series so if you haven't read the introduction in Part I and the manual partitioning guide in Part II, we'd like to suggest you do so now.

Now that we have our largest tables split into time-based partitions and old data manually exported, it's time to automate this process so that we don't have to deal with it again!

We use a somewhat longer shell script, autoarchive-tables.sh that handles the whole process of auto-archival for us. We will describe what it does below.

Auto-archival configuration

The script reads its tasks from a simple configuration file autoarchived-tables.txt, which can look like this:

# Configuration for tables auto-archival.
#
# Format:
# table_name interval keep

my_huge_table monthly 12
less_relevant_huge_table monthly 6
quickly_growing_table weekly 12
slowly_growing_table yearly 1

I.e., for each table that should be auto-archived (and has been prepared for it with the manual procedure described in previous parts of this article), you have to specify:

the table name
how frequently it should be archived
how many time periods (historical partitions) should be kept in the main database.

Easy, huh? 😉

Running the auto-archival script

The script is written in bash and requires at least Bash 4 (it uses some newer syntax). It tries to do a lot of sanity checks and should immediately stop if it encounters something unexpected or some sub-command fails. We are no bash gurus but we tried quite hard and the script is happily running on our production server, without issues so far…

Be sure to edit at least the $database variable in the script before you try to run it. And, as always, test this in a safe environment first!

The output from the script may look like this:

Starting archival of 'my_huge_table' with 'monthly' interval, keeping 12 partitions
Found partitions:          m201901 m201902 m201903 m201904 m201905 m201906 m201907 m201908 m201909 m201910 m201911 m201912 m202001 m202002
Calculated new partitions: m201902 m201903 m201904 m201905 m201906 m201907 m201908 m201909 m201910 m201911 m201912 m202001 m202002 m202003
Will add partitions:       m202003
Will export partitions:    m201901

It says that it will process the my_huge_table table and will archive this table monthly, keeping 12 historical months of data.

The output also says that the script found one too old partition in the table (m201901) which it will export and remove from the table. Finally, it calculated that one new partition (m202003, for the next month) is missing and will be added to the table.

The auto-archival script steps

In essence, the script is supposed to be periodically called from cron and does the following:

It reads the configuration file, parses it and does some basic sanity checks about the configuration.
It retrieves the list of the current partitions in the table.
It calculates the list of the ”new“ partitions that should be ideally present in the table. For this calculation, it uses the keep config variable and always includes one partition for the next time period in the future (next week / month / year).
It compares these two lists of partitions and determines what partition(s) should be exported and deleted from the table and what partition(s) should be newly added to the table. If there are no partitions to be added or exported, the script continues with another table from the config file.
Next it checks that there are no records in the internal partition called future. If it finds any records, the script stops with an error as this would mean that you added some data to the table with a future date, which is unexpected.
Next comes the ”danger zone“ - the script runs MySQL commands that ALTER the tables:
- It drops the internal future partition because MySQL only allows a partition to be added to the end of the list.
- It adds the new partition(s) to the table.
- It re-adds the internal partition future to the table, back to the end of the partitions list.
Finally it exports data from the too old partitions. For this it simply calls the export-partition.sh script, that we introduced in Part II of this guide. This will move all data from these partitions into gzipped files and remove the (now empty) partitions from the table.

Tip: we use a neat little tool called cronic in our setup. It is supposed to be run from cron. Cron will by default send you an email whenever anything is printed to the standard output (STDOUT). This is not that helpful because you would get an email every time the archival script is run, regardless whether it actually changed anything in your database.

cronic changes this behavior and triggers sending email only when something is written to the error output (STDERR). We are using this feature in the archival script to trigger sending an email to our admins only when tables in our database are actually ALTERed.

This finishes our saga about large MySQL tables archival. Your comments and bug reports are welcome and we wish you good luck with alleviating your db server! 🤞

DEV Community

Archiving large MySQL tables (part III - automation)

Auto-archival configuration

Running the auto-archival script

The auto-archival script steps

Top comments (0)

Read next

A Playground for SQL RDBMS, custom Ollama API Interaction with RAG on Timescale DB and download of query plus result

Day 2 of SQL Series || Basic Commands

True Alphanumeric / natural sorting in MySQL - why is the answer always recursion?

MySQL Terminal: Alter table, charset and delete