Now that we have our largest tables split into time-based partitions and old data manually exported, it's time to automate this process so that we don't have to deal with it again!
We use a somewhat longer shell script,
autoarchive-tables.sh that handles the whole process of auto-archival for us. We will describe what it does below.
The script reads its tasks from a simple configuration file
autoarchived-tables.txt, which can look like this:
# Configuration for tables auto-archival. # # Format: # table_name interval keep my_huge_table monthly 12 less_relevant_huge_table monthly 6 quickly_growing_table weekly 12 slowly_growing_table yearly 1
I.e., for each table that should be auto-archived (and has been prepared for it with the manual procedure described in previous parts of this article), you have to specify:
- the table name
- how frequently it should be archived
- how many time periods (historical partitions) should be kept in the main database.
Easy, huh? 😉
The script is written in bash and requires at least Bash 4 (it uses some newer syntax). It tries to do a lot of sanity checks and should immediately stop if it encounters something unexpected or some sub-command fails. We are no bash gurus but we tried quite hard and the script is happily running on our production server, without issues so far…
Be sure to edit at least the
$database variable in the script before you try to run it. And, as always, test this in a safe environment first!
The output from the script may look like this:
Starting archival of 'my_huge_table' with 'monthly' interval, keeping 12 partitions Found partitions: m201901 m201902 m201903 m201904 m201905 m201906 m201907 m201908 m201909 m201910 m201911 m201912 m202001 m202002 Calculated new partitions: m201902 m201903 m201904 m201905 m201906 m201907 m201908 m201909 m201910 m201911 m201912 m202001 m202002 m202003 Will add partitions: m202003 Will export partitions: m201901
It says that it will process the
my_huge_table table and will archive this table monthly, keeping 12 historical months of data.
The output also says that the script found one too old partition in the table (
m201901) which it will export and remove from the table. Finally, it calculated that one new partition (
m202003, for the next month) is missing and will be added to the table.
In essence, the script is supposed to be periodically called from
cron and does the following:
It reads the configuration file, parses it and does some basic sanity checks about the configuration.
It retrieves the list of the current partitions in the table.
It calculates the list of the ”new“ partitions that should be ideally present in the table. For this calculation, it uses the
keepconfig variable and always includes one partition for the next time period in the future (next week / month / year).
It compares these two lists of partitions and determines what partition(s) should be exported and deleted from the table and what partition(s) should be newly added to the table. If there are no partitions to be added or exported, the script continues with another table from the config file.
Next it checks that there are no records in the internal partition called
future. If it finds any records, the script stops with an error as this would mean that you added some data to the table with a future date, which is unexpected.
Next comes the ”danger zone“ - the script runs MySQL commands that
- It drops the internal
futurepartition because MySQL only allows a partition to be added to the end of the list.
- It adds the new partition(s) to the table.
- It re-adds the internal partition
futureto the table, back to the end of the partitions list.
- It drops the internal
Finally it exports data from the too old partitions. For this it simply calls the
export-partition.shscript, that we introduced in Part II of this guide. This will move all data from these partitions into gzipped files and remove the (now empty) partitions from the table.
Tip: we use a neat little tool called
cronicin our setup. It is supposed to be run from
cron. Cron will by default send you an email whenever anything is printed to the standard output (
STDOUT). This is not that helpful because you would get an email every time the archival script is run, regardless whether it actually changed anything in your database.
cronicchanges this behavior and triggers sending email only when something is written to the error output (
STDERR). We are using this feature in the archival script to trigger sending an email to our admins only when tables in our database are actually
This finishes our saga about large MySQL tables archival. Your comments and bug reports are welcome and we wish you good luck with alleviating your db server! 🤞