DEV Community

Cover image for MongoDb backup to S3 using CronJobs on Kubernetes
Adenle Abiodun
Adenle Abiodun

Posted on • Updated on

MongoDb backup to S3 using CronJobs on Kubernetes

I was recently faced with a challenge of doing incremental backup of a mongo database so we can quickly recover in the event of a database crash or some other wierd things happen and the Db becomes unusable.

There are several ways of achieveing this, but then again, i had to stick to one using kubernetes since i already have a kubernetes cluster running.

Prerequisites:

  • Docker installed on your machine
  • Container repository (Docker Hub, Google Container Registry, etc) - I use docker hub
  • Kubernetes cluster running

Our working folder structure should be like this:
db_backup/
├── awesome_backup.sh
├── Dockerfile
├── cronjob-deployment.yaml
└── backup.env

Steps to achieve this:

  • MongoDB installed on the server and running (or Atlas)
  • AWS CLI installed in a docker container
  • A bash script that will be run on the server to backup the database
  • AWS S3 Bucket configured
  • Build and deploy on Kubernetes

MongoDB Setup.

You can setup a mongo database on your server or use an Atlas cluster. The Atlas cluster is a great way to setup a mongo database and is free for M0 clusters. You can also use a mongo database on your server or on a kubernetes cluster.

It doesn't matter which one you use, the most important thing is to have a mongo database running.

You can visit the official MongoDB website to sign up and setup a mongo Atlas cluster at MongoDB Atlas

After creating your mongodb instance, we will need the MONGODB_URI. Keep it safe somewhere, we will need it later.

AWS CLI Setup in a Docker Container.

This is a very important step as we will be using this container to run our scheduled backups to S3 using kubernetes cronjobs.

You need to create a dockerfile with the following contents. I will explain each steps in details.

# Set up ubuntu image 
FROM ubuntu:18.04

ENV DEBIAN_FRONTEND noninteractive

# install mongodb backup tools 
RUN apt-get update && apt-get install -y mongo-tools

#  install aws cli
RUN apt-get install -y awscli

# copy mongodb backup script to /
COPY ./awesome_backup.sh ./

RUN chmod +x ./awesome_backup.sh

# Run the command on container startup
CMD ["bash"]
Enter fullscreen mode Exit fullscreen mode

The above dockerfile is simple and straight forward.

Step 1:
Pull ubuntu 18.04 image. This will be the base image for our container. You could also use a different linux distro like centos or debian. Just that your commands or syntax might be sligthly different.
In my case i am using an ubuntu image.

Step 2:
Set the DEBIAN_FRONTEND environment variable to non interactive. This is to prevent the user from being prompted for input.

Step 3:
Install mongodb backup tools. This will enable us to backup the mongo database using command like mongodump.

Step 4:
Install aws cli. This will enable us to use aws cli to interact with AWS services.

Step 5:
Copy the mongodb backup script to the root directory in the container. To be honest, you can copy the script in any directory, just ensure you take note of where the script is copied to. For simplicity, i will just put it in the root folder.

We will create this script in the next step but for now we will give it a name called "awesome_backup" with a .sh extension. Remember ... it is a bash script.

Step 6:
We will need to change the permissions of the script to make it executable by running
chmod +x awesome_backup.sh.

Step 7:
We will also need to run the command on container startup. This is done by running the CMD command.

That is it. You are now ready to run the container and run the script... Oh!, did i say run your script?

Yeah, we need to create the script before we can run it.

Our Awesome bash script.

This is the brain of this whole process. I will breakdown what this script does in detail. After this, you will become a bash script ninja. Well, at least you will be able to do it.

See the script below.

#!/bin/bash

#  Configure all aws variables needed for the script to work
aws configure set aws_access_key_id $AWS_ACCESS_KEY_ID
aws configure set aws_secret_access_key $AWS_SECRET_ACCESS_KEY
aws configure set region $AWS_DEFAULT_REGION

# Create a new folder to store the backup files
mkdir -p /backup/
echo "Creating backup folder ... 👜"

# change the current directory to usr/bin folder of the container
cd /usr/bin 
echo "💿 Backup started at $(date)"

# if mongodump command is successful echo success message else echo failure message
if mongodump --forceTableScan  --uri $MONGODB_URI  --gzip --archive > ../../backup/dump_`date "+%Y-%m-%d-%T"`.gz && cd ../../ && aws s3 cp /backup/ s3://$S3_BUCKET/db_backup/ --recursive
then
    echo "💿 😊 👍 Backup completed successfully at $(date)"
    echo " 📦 Uploaded to s3 bucket 😊 👍"
else
    echo  "📛❌📛❌ Backup failed at $(date)"
fi

echo "Cleaning up... 🧹"
# Clean up by removing the backup folder
rm -rf /backup/ 

echo "Done 🎉"
Enter fullscreen mode Exit fullscreen mode

You might be wondering why i have '$' sign in front of some capitalized words. Well those are environment variables which we are going to pass later to the container at execution time.

To simple explain the script in detail, bash scripts starts with a shebang (#!) and then the commands. in this case, #!/bin/bash.

  • We will first configure the aws cli using the aws configure command. This will set the aws cli variables like AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_DEFAULT_REGION.

  • Then we will create a new folder to store the backup files.

  • Then we will change the current directory to usr/bin folder of the container.

  • Now there is an if statement that checks if: the mongodump command is successful AND we changed the current directory to ../../(back to root folder) AND then upload the backup to s3 bucket.

  • If all these conditions are met, then we will echo success message. If not, we will echo failure message.

  • Then we will run the rm command. This will remove the backup folder.

DONE!!!

I echoed out some messages to explain what step the script is current doing at each point in the execution flow. This helps when reading the logs and debugging the script.

If you followed me up to this point, you are a Rockster bash script ninja 😊👍.

AWS S3 Bucket Setup.

You need to login to your AWS account, search for the S3 bucket service and create a new bucket. You can read more about this here: AWS S3 setup

If you are already logged in to your AWS account via your terminal, you can create a new bucket by running the following command:

aws s3api create-bucket --bucket $S3_BUCKET --region $AWS_DEFAULT_REGION

$S3_BUCKET is the name of the bucket you want to create.
$AWS_DEFAULT_REGION is the region you want to create the bucket in.

for example:
aws s3api create-bucket --bucket my_mongodb_backup_buket --region us-east-1

Note: For simplicity you can create a .env file (e.g backup.env) in the root folder of the project and add the following variables to it.

S3_BUCKET=xxxxxx
MONGODB_URI=xxxxxx
AWS_ACCESS_KEY_ID=xxxxxx
AWS_SECRET_ACCESS_KEY=xxxxxx
AWS_DEFAULT_REGION=xxxxxx
Enter fullscreen mode Exit fullscreen mode

We are going to use the .env file to create secrets in our kubernetes cluster that will then be used to pass the variables to the container at execution time.

Drum roll 🥁 !!!!!

Tada 🎉 !!!!!

Build and Deploy

Okay now to the final parts of this process. We are going to build the container and deploy it to our kubernetes cluster.

If you are not familiar with kubernetes, you can read more about it here: Kubernetes Basics

To build and deploy. Ensure you have the correct folder structure. Mine looks like this:
db_backup/
├── awesome_backup.sh
├── Dockerfile
├── cronjob-deployment.yaml
└── backup.env

Ensure you are in the root folder of the project (db_backup).

Then run this to build the container:

docker build . -t repository_name/container_name:tag_name

e.g docker build . -t abc_inc/cronjob-server:latest

To run the container locally you can use the following command to passing secrets from the env file and run the container:
docker run -it --env-file backup.env abc_inc/cronjob-server:latest /bin/bash

You can the push to your docker or image repository by running the following command:

docker push repository_name/container_name:tag_name

e.g docker push abc_inc/cronjob-server:latest

Now we will create the cronjob deployment file.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: db-backup
  namespace: default   # namespace where the cronjob will be deployed
spec:
  schedule: "*/5 * * * *"   # this runs the container every 5 minutes you can change this to any time you want.
  successfulJobsHistoryLimit: 1
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: db-backup
              image: abc_inc/cronjob-server:latest
              command: ["/bin/bash", "-c", "sh ./awesome_backup.sh"]       #command to run when container starts
              imagePullPolicy: IfNotPresent
              envFrom:
                - secretRef:
                    name: backup-secrets     #secrets stored being passed into the container as environment variables
          restartPolicy: Never

Enter fullscreen mode Exit fullscreen mode

On the second to the last line of the yaml file, you see that we referenced a secret name. This is where we will pass the variables to the container.

To create the secret inside our kubernetes cluster, we will run:

kubectl create secret generic secret-name --from-env-file=path-to-env-file

e.g kubectl create secret generic backup-secrets --from-env-file=backup.env

This secret will be loaded from the env file and will be available to the container from the cluster.

After creating the secret, we will then deploy the cronjob deployment file into our kubernetes cluster by running:

kubectl apply -f cronjob-deployment.yaml

When you run kubectl get all you should see your Kubernetes CronJob Scheduled.

NAME                          SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob.batch/db-backup      */5 * * * *   False     0        4m58s           6h23m
Enter fullscreen mode Exit fullscreen mode

When it runs after 5 minutes, you should see a gzip backup of your MongoDB on your S3 bucket.

Reference :
Image source: https://educba.com

Top comments (0)