DEV Community

Javier Vidal
Javier Vidal

Posted on

ClickHouse incremental backups

clickhouse-backup allows us to perform local backups, that are always full backups, and full or incremental uploads to remote storage. In my previous post I talked about how to perform full backups and uploads. Now we are going to review all the steps required to work with incremental uploads. This way we could upload a weekly full backup to our remote storage and perform daily incremental uploads.

Let's assume we already have clickhouse-backup installed and we have configured a remote storage.

Let's create a database backup, a table test and a user backup than can access this database:

:) CREATE DATABASE IF NOT EXISTS backup

:) CREATE TABLE IF NOT EXISTS backup.test
(
  `date` Datetime,
  `user_id` String,
  `pageviews` Int32
)
ENGINE = MergeTree()
PARTITION BY toStartOfHour(date)
ORDER BY (date)

:) CREATE USER backup IDENTIFIED WITH plaintext_password BY 'qwerty'

:) GRANT ALL ON backup.* TO backup
Enter fullscreen mode Exit fullscreen mode

We can edit /etc/clickhouse-backup/config.yml to configure this user as the clickhouse-backup user.

Let's insert some data in the table:

INSERT INTO backup.test VALUES
('2021-01-13 07:00:00', 'user 1', 7),
('2021-01-13 08:00:00', 'user 2', 3),
('2021-01-13 09:00:00', 'user 3', 1),
('2021-01-13 10:00:00', 'user 4', 12)
Enter fullscreen mode Exit fullscreen mode

Perform a full backup and upload it:

# clickhouse-backup create backup1
2021/01/14 10:51:06 Create backup 'backup1'
2021/01/14 10:51:06 Freeze 'backup.test'
2021/01/14 10:51:06 Copy part hashes
2021/01/14 10:51:06 Writing part hashes
2021/01/14 10:51:06 Copy metadata
2021/01/14 10:51:06   Done.
2021/01/14 10:51:06 Move shadow
2021/01/14 10:51:06   Done.
# clickhouse-backup upload backup1
2021/01/14 10:51:40 Upload backup 'backup1'
 4.36 KiB / 4.36 KiB [======================================================] 100.00% 0s
2021/01/14 10:51:41   Done.
Enter fullscreen mode Exit fullscreen mode

Insert more data, perform a full backup and an incremental upload:

INSERT INTO backup.test VALUES
('2021-01-13 11:00:00', 'user 3', 5),
('2021-01-13 12:00:00', 'user 5', 2)
Enter fullscreen mode Exit fullscreen mode
# clickhouse-backup create backup2
2021/01/14 10:55:04 Create backup 'backup2'
2021/01/14 10:55:05 Freeze 'backup.test'
2021/01/14 10:55:05 Copy part hashes
2021/01/14 10:55:05 Writing part hashes
2021/01/14 10:55:05 Copy metadata
2021/01/14 10:55:05   Done.
2021/01/14 10:55:05 Move shadow
2021/01/14 10:55:05   Done.
# clickhouse-backup upload --diff-from backup1 backup2
2021/01/14 10:55:38 Upload backup 'backup2'
 6.44 KiB / 6.44 KiB [======================================================] 100.00% 0s
2021/01/14 10:55:38   Done.
Enter fullscreen mode Exit fullscreen mode

And once more:

INSERT INTO backup.test VALUES
('2021-01-13 13:00:00', 'user 1', 1),
('2021-01-13 14:00:00', 'user 5', 8)
Enter fullscreen mode Exit fullscreen mode
# clickhouse-backup create backup3
2021/01/14 10:56:10 Create backup 'backup3'
2021/01/14 10:56:10 Freeze 'backup.test'
2021/01/14 10:56:10 Copy part hashes
2021/01/14 10:56:10 Writing part hashes
2021/01/14 10:56:10 Copy metadata
2021/01/14 10:56:10   Done.
2021/01/14 10:56:10 Move shadow
2021/01/14 10:56:10   Done.
# clickhouse-backup upload --diff-from backup2 backup3
2021/01/14 10:56:20 Upload backup 'backup3'
 8.52 KiB / 8.52 KiB [======================================================]
 100.00% 0s
2021/01/14 10:56:21   Done.
Enter fullscreen mode Exit fullscreen mode

Delete the local backups:

# clickhouse-backup delete local backup1
# clickhouse-backup delete local backup2
# clickhouse-backup delete local backup3
Enter fullscreen mode Exit fullscreen mode

List the remote backups:

# clickhouse-backup list
Local backups:
no backups found
Remote backups:
- 'backup1.tar.gz'  3.54 KiB    (created at 14-01-2021 12:12:42)
- 'backup2.tar.gz'  3.02 KiB    (created at 14-01-2021 12:13:17)
- 'backup3.tar.gz'  3.29 KiB    (created at 14-01-2021 12:14:47)
Enter fullscreen mode Exit fullscreen mode

Let's drop the database:

DROP DATABASE backup
Enter fullscreen mode Exit fullscreen mode

If we download backup3, the latest backup, we see that backup1 and backup2 are also downloaded:

# clickhouse-backup download backup3
 0 / 3374 [------------------------------------------------------]   0.00%
2021/01/14 13:16:39 Backup 'backup3' required 'backup2'. Downloading.
 0 / 3088 [------------------------------------------------------]   0.00%
2021/01/14 13:16:40 Backup 'backup2' required 'backup1'. Downloading.
 3.54 KiB / 3.54 KiB [======================================================] 100.00% 0s
 3.02 KiB / 3.02 KiB [======================================================] 100.00% 0s
 3.29 KiB / 3.29 KiB [======================================================] 100.00% 1s
2021/01/14 13:16:41   Done.
Enter fullscreen mode Exit fullscreen mode

If we examine backup3.tar.gz we could find a meta.json file that contains:

{
    "required_backup": "backup2",
    "hardlinks": [
        "shadow/backup/test/1610517600_1_1_0/checksums.txt",
        "shadow/backup/test/1610517600_1_1_0/columns.txt",
    ...
Enter fullscreen mode Exit fullscreen mode

So these meta.json files will keep track of the files that are kept from previous backups.

Let's do the restore. If we restore the latest incremental backup:

# clickhouse-backup restore backup3
2021/01/14 13:19:54 Create table 'backup.test'
2021/01/14 13:19:54 Prepare data for restoring 'backup.test'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610517600_1_1_0'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610521200_2_2_0'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610524800_3_3_0'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610528400_4_4_0'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610532000_5_5_0'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610535600_6_6_0'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610539200_7_7_0'
2021/01/14 13:19:54 ALTER TABLE `backup`.`test` ATTACH PART '1610542800_8_8_0'
Enter fullscreen mode Exit fullscreen mode

We are restoring the full backup:

:) select count() from backup.test

SELECT count()
FROM backup.test

┌─count()─┐
│       8 │
└─────────┘

1 rows in set. Elapsed: 0.003 sec.
Enter fullscreen mode Exit fullscreen mode

😄

Top comments (3)

Collapse
 
gowth profile image
gowth

since it is adding up all the difference data in single metadata.json as follows
full- 4.28GiB 16/09/2021 09:15:17 local

2021-09-16-09-20-35 6.83GiB 16/09/2021 09:20:54 local

full- 2.06GiB 16/09/2021 09:17:18 remote tar
2021-09-16-09-20-35 1.52GiB 16/09/2021 09:22:21 remote +full- tar

how can we break this up and made all those Diff data tars as individuals??

Collapse
 
gowth profile image
gowth

Does restoring latest backup will download all the previous diff backups that is linked to it ???

Collapse
 
jv profile image
Javier Vidal

Yes, it will.