How to Schedule Database Backups with Cronjob and Upload to AWS S3

#kubernetes #aws #automation #database

Introduction

The update procedure is a vital operation every team should consider. However, doing it manually can be exhausting. As a toil, you can automate it simply by creating a cronjob to take the backup and upload it to your desired object storage.

This article explains how to automate your database backup with a Cronjob and upload it to the Cloud S3. Postgres has been considered as the database, but you can generalize it to any other database or data type you want.

Step 1: Create configmap

To perform the backup and upload, we need a bash script. It should first login into the cloud via the oc login command. Then gets your desired database pod name. Executes the dump command, zips it, and downloads it via the rsync command. Finally, it uploads it to the AWS S3 object storage.

During running the script and before the dumping, it gets a prompt from the user as an Are you sure? which you can bypass by the ‍‍-y‍ option.

All credentials like OKD Token or Postgres Password are passed to the application as environment variables.

By putting this bash script in a config map, it can be mounted as a volume in the cronjob. Remember to replace PROJECT_HERE with your project name and customize the variables in the bash script according to your project specifications.

kind: ConfigMap
apiVersion: v1
metadata:
  name: bash-script
  namespace: PROJECT_HERE
data:
  backup.sh: >
    #!/bin/bash

    # This file provides a backup script for postgres

    # Variables: Modify your variables according to the okd projects and database secrets

    NAMESPACE=PROJECT_HERE

    S3_URL=https://s3.mycompany.com

    STATEFULSET_NAME=postgres

    BACKUP_NAME=backup-$(date "+%F")

    S3_BUCKET=databases-backup

    # Exit the script anywhere faced the error 

    set -e

    # Define the confirm option about user prompt (yes or no)

    confirm=""

    # Parse command-line options

    while getopts "y" opt; do
        case $opt in
        y)
            confirm="y"
            ;;
        \?)
            echo "Invalid option: -$OPTARG" >&2
            exit 1
            ;;
        esac
    done

    # Login to OKD

    oc login ${S3_URL} --token=${OKD_TOKEN}

    POD_NAME=$(oc get pods -n ${NAMESPACE} | grep ${STATEFULSET_NAME} | cut -d' '
    -f1)

    echo The backup of database in pod ${POD_NAME} will be dumped in ${BACKUP_NAME}
    file.

    DUMP_COMMAND='PGPASSWORD="'${POSTGRES_USER_PASSWORD}'" pg_dump -U
    '${POSTGRES_USER}' '${POSTGRES_DB}' > /bitnami/postgresql/backup/'${BACKUP_NAME}

    GZIP_COMMAND='gzip /bitnami/postgresql/backup/'${BACKUP_NAME}

    REMOVE_COMMAND='rm /bitnami/postgresql/backup/'${BACKUP_NAME}.gz

    # Prompt the user for confirmation if the -y option was not provided

    if [[ $confirm != "y" ]]; then
        read -r -p "Are you sure you want to proceed? [y/N] " response
        case "$response" in
        [yY][eE][sS] | [yY])
            confirm="y"
            ;;
        *)
            echo "Aborted"
            exit 0
            ;;
        esac
    fi

    # Dump the backup and zip it

    oc exec -n ${NAMESPACE} "${POD_NAME}" -- sh -c "${DUMP_COMMAND} && ${GZIP_COMMAND}"

    echo Transfer it to current local folder

    oc rsync -n ${NAMESPACE} ${POD_NAME}:/bitnami/postgresql/backup/ /backup-files
    &&
        oc exec -n ${NAMESPACE} "${POD_NAME}" -- sh -c "${REMOVE_COMMAND}"

    # Send backup files to AWS S3

    aws --endpoint-url "${S3_URL}" s3 sync /backup-files
    s3://${S3_BUCKET}

Step 2: Create secrets

Database, AWS, and OC credentials should be kept as secrets. First, we’ll create a secret containing the AWS CA Bundles. After downloading the bundle, you can make a secret file from it:

$ oc create secret -n PROJECT_HERE generic certs --from-file ca-bundle.crt

You should replace PROJECT_HERE with your project name.

Now let’s create another secret for other credentials. Consider that you should specify AWS_CA_BUNDLE with=/certs/ca-bundle.crt

kind: Secret
apiVersion: v1
metadata:
  name: mysecret
  namespace: PROJECT_HERE

data:
  AWS_CA_BUNDLE: 
  OKD_TOKEN: 
  POSTGRES_USER_PASSWORD: 
  POSTGRES_USER:
  POSTGRES_DB:  
  AWS_SECRET_ACCESS_KEY: 
  AWS_ACCESS_KEY_ID: 

type: Opaque

Step 3: Create cronjob

To create the cronjob, we need a docker image capable of running oc and aws commands. You can find this image and its Docker file here if you are inclined to customize it.

Now let’s create the cronjob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: database-backup
  namespace: PROJECT_HERE 
spec:
  schedule: 0 3 * * *
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: hamedkarbasi/aws-cli-oc:1.0.0
            command: ["/bin/bash", "-c", "/backup-script/backup.sh -y"]
            envFrom:
              - secretRef:
                  name: mysecret
            volumeMounts:
              - name: script
                mountPath: /backup-script/backup.sh
                subPath: backup.sh
              - name: certs
                mountPath: /certs/ca-bundle.crt
                subPath: ca-bundle.crt
              - name: kube-dir
                mountPath: /.kube
              - name: backup-files
                mountPath: /backup-files
          volumes:
            - name: script
              configMap:
                name: backup-script
                defaultMode: 0777
            - name: certs
              secret: 
                secretName: certs
            - name: kube-dir
              emptyDir: {}
            - name: backup-files
              emptyDir: {}
          restartPolicy: Never

Again, you should replace PROJECT_HERE with your project name and the schedule parameter, with your desired run job frequency. By putting all manifests in a folder named backup, we can apply it to Kubernetes:

$ oc apply -f backup

This cronjob will be run at 3:00 AM every night, dumping the database and uploading to the AWS S3.

Conclusion

In conclusion, automating database backups to AWS S3 using cronjob can save you time and effort while ensuring your valuable data is stored securely in the cloud. Following the steps outlined in this guide, you can easily set up a backup schedule that meets your needs and upload your backups to AWS S3 for safekeeping. Remember to test your backups regularly to ensure they can be restored when needed, and keep your AWS credentials and permissions secure to prevent unauthorized access. With these best practices in mind, you can have peace of mind knowing that your database backups are automated and securely stored in the cloud.