DEV Community

Cover image for Rancher 101 - Backups and Disaster Recovery
Muhammad Abdur Rofi
Muhammad Abdur Rofi

Posted on

Rancher 101 - Backups and Disaster Recovery

Backing Up the Cluster

Eventually something will happen that will have you reaching for a backup. It’s not a question of if; it’s a question of when.

We want you to be prepared for that inevitability. For Kubernetes, that means that you have regular backups of the etcd data store.

How regular? That's up to you. How much data can you afford to lose?

An extremely active cluster might make a snapshot every fifteen minutes, or maybe even every five minutes. A less active cluster might make a snapshot once a day.

RKE defaults to making a snapshot every six hours and holding snapshots for one day, but you can change the settings in the config.

You can run a snapshot at any time by executing the command rke etcd snapshot-save and passing it the name of the backup and the location of the cluster.yml file. This will write a snapshot to /opt/rke/etcd-snapshots on all etcd hosts. You can mount an NFS volume there, or you can configure RKE to copy the snapshot to S3.

You can also configure RKE to make recurring snapshots automatically. The configuration for that lives in the etcd service's backup_configkey, where you can configure the following:

  • interval_hours, which is how often to take a snapshot
  • retention, which is how long to keep the snapshot
  • s3 backup config, which contains the information that RKE needs to copy the snapshot to S3.

The recurring snapshot service launches a container on hosts with the etcd role, and if you log into one of those nodes, you can see the log data in the output of docker logs for the etcd-rolling-snapshots container.

If your RKE cluster is outside of AWS, you can still store your backups on S3. But if you'd prefer not to use Amazon at all, you can run an S3-compatible service like Minio.

If you have Minio behind a self-signed certificate or a certificate from an unknown CA, you'll need to add the signing certificate to the custom_cakey in the s3 backup config section of cluster.yml.

Restoring From a Backup

Snapshots are always saved locally, and they can also be written out to S3. If you have a local copy of the snapshot, you can restore it with rke etcd snapshot-restore if you provide the name of the snapshot.

Run this command from the same directory with cluster.yml and cluster.rkestate and make sure that the snapshot is in /opt/rke/etcd-snapshots on one of the nodes.

Alternatively, you can provide the S3 credentials for where the snapshot was saved, and RKE will pull the snapshot down from S3 and apply it.

Restoring a snapshot is a destructive process. It will delete the current cluster and create anew cluster from the RKE snapshot. We recommend that you create a snapshot of the current cluster state before restoring a snapshot into it.

References

Discussion (0)