clickhouse-backup allows us to perform local backups, that are always full backups, and full or incremental uploads to remote storage. In my previous post I talked about how to perform full backups and uploads. Now we are going to review all the steps required to work with incremental uploads. This way we could upload a weekly full backup to our remote storage and perform daily incremental uploads.
Let's assume we already have clickhouse-backup
installed and we have configured a remote storage.
Let's create a database backup
, a table test
and a user backup
than can access this database:
:) CREATE DATABASE IF NOT EXISTS backup
:) CREATE TABLE IF NOT EXISTS backup.test
(
`date` Datetime,
`user_id` String,
`pageviews` Int32
)
ENGINE = MergeTree()
PARTITION BY toStartOfHour(date)
ORDER BY (date)
:) CREATE USER backup IDENTIFIED WITH plaintext_password BY 'qwerty'
:) GRANT ALL ON backup.* TO backup
We can edit /etc/clickhouse-backup/config.yml
to configure this user as the clickhouse-backup
user.
Let's insert some data in the table:
INSERT INTO backup.test VALUES
('2021-01-13 07:00:00', 'user 1', 7),
('2021-01-13 08:00:00', 'user 2', 3),
('2021-01-13 09:00:00', 'user 3', 1),
('2021-01-13 10:00:00', 'user 4', 12)
Perform a full backup and upload it:
# clickhouse-backup create backup1
2021/01/14 10:51:06 Create backup 'backup1'
2021/01/14 10:51:06 Freeze 'backup.test'
2021/01/14 10:51:06 Copy part hashes
2021/01/14 10:51:06 Writing part hashes
2021/01/14 10:51:06 Copy metadata
2021/01/14 10:51:06 Done.
2021/01/14 10:51:06 Move shadow
2021/01/14 10:51:06 Done.
# clickhouse-backup upload backup1
2021/01/14 10:51:40 Upload backup 'backup1'
4.36 KiB / 4.36 KiB [======================================================] 100.00% 0s
2021/01/14 10:51:41 Done.
Insert more data, perform a full backup and an incremental upload:
INSERT INTO backup.test VALUES
('2021-01-13 11:00:00', 'user 3', 5),
('2021-01-13 12:00:00', 'user 5', 2)
# clickhouse-backup create backup2
2021/01/14 10:55:04 Create backup 'backup2'
2021/01/14 10:55:05 Freeze 'backup.test'
2021/01/14 10:55:05 Copy part hashes
2021/01/14 10:55:05 Writing part hashes
2021/01/14 10:55:05 Copy metadata
2021/01/14 10:55:05 Done.
2021/01/14 10:55:05 Move shadow
2021/01/14 10:55:05 Done.
# clickhouse-backup upload --diff-from backup1 backup2
2021/01/14 10:55:38 Upload backup 'backup2'
6.44 KiB / 6.44 KiB [======================================================] 100.00% 0s
2021/01/14 10:55:38 Done.
And once more:
INSERT INTO backup.test VALUES
('2021-01-13 13:00:00', 'user 1', 1),
('2021-01-13 14:00:00', 'user 5', 8)
# clickhouse-backup create backup3
2021/01/14 10:56:10 Create backup 'backup3'
2021/01/14 10:56:10 Freeze 'backup.test'
2021/01/14 10:56:10 Copy part hashes
2021/01/14 10:56:10 Writing part hashes
2021/01/14 10:56:10 Copy metadata
2021/01/14 10:56:10 Done.
2021/01/14 10:56:10 Move shadow
2021/01/14 10:56:10 Done.
# clickhouse-backup upload --diff-from backup2 backup3
2021/01/14 10:56:20 Upload backup 'backup3'
8.52 KiB / 8.52 KiB [======================================================]
100.00% 0s
2021/01/14 10:56:21 Done.
Delete the local backups:
# clickhouse-backup delete local backup1
# clickhouse-backup delete local backup2
# clickhouse-backup delete local backup3
List the remote backups:
# clickhouse-backup list
Local backups:
no backups found
Remote backups:
- 'backup1.tar.gz' 3.54 KiB (created at 14-01-2021 12:12:42)
- 'backup2.tar.gz' 3.02 KiB (created at 14-01-2021 12:13:17)
- 'backup3.tar.gz' 3.29 KiB (created at 14-01-2021 12:14:47)
Let's drop the database:
DROP DATABASE backup
If we download backup3, the latest backup, we see that backup1 and backup2 are also downloaded:
# clickhouse-backup download backup3
0 / 3374 [------------------------------------------------------] 0.00%
2021/01/14 13:16:39 Backup 'backup3' required 'backup2'. Downloading.
0 / 3088 [------------------------------------------------------] 0.00%
2021/01/14 13:16:40 Backup 'backup2' required 'backup1'. Downloading.
3.54 KiB / 3.54 KiB [======================================================] 100.00% 0s
3.02 KiB / 3.02 KiB [======================================================] 100.00% 0s
3.29 KiB / 3.29 KiB [======================================================] 100.00% 1s
2021/01/14 13:16:41 Done.
If we examine backup3.tar.gz
we could find a meta.json
file that contains:
{
"required_backup": "backup2",
"hardlinks": [
"shadow/backup/test/1610517600_1_1_0/checksums.txt",
"shadow/backup/test/1610517600_1_1_0/columns.txt",
...
So these meta.json
files will keep track of the files that are kept from previous backups.
Let's do the restore. If we restore the latest incremental backup:
# clickhouse-backup restore backup3
2021/01/14 13:19:54 Create table 'backup.test'
2021/01/14 13:19:54 Prepare data for restoring 'backup.test'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610517600_1_1_0'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610521200_2_2_0'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610524800_3_3_0'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610528400_4_4_0'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610532000_5_5_0'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610535600_6_6_0'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610539200_7_7_0'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610542800_8_8_0'
We are restoring the full backup:
:) select count() from backup.test
SELECT count()
FROM backup.test
┌─count()─┐
│ 8 │
└─────────┘
1 rows in set. Elapsed: 0.003 sec.
😄
Top comments (3)
since it is adding up all the difference data in single metadata.json as follows
full- 4.28GiB 16/09/2021 09:15:17 local
2021-09-16-09-20-35 6.83GiB 16/09/2021 09:20:54 local
full- 2.06GiB 16/09/2021 09:17:18 remote tar
2021-09-16-09-20-35 1.52GiB 16/09/2021 09:22:21 remote +full- tar
how can we break this up and made all those Diff data tars as individuals??
Does restoring latest backup will download all the previous diff backups that is linked to it ???
Yes, it will.