GitProtect Team for GitProtect

Posted on Feb 19, 2024 • Edited on Apr 8, 2024 • Originally published at gitprotect.io

DevSecOps MythBuster – “Git Clone and DevOps Backup Script is all I need for data protection”

#devops #devsecops #coding #security

Welcome back(up) to the DevSecOps MythBuster series! This episode is dedicated to stubborns who still believe that…

I use git clone/mirror, so I have a backup…
Trust me, I am the developer – I wrote my own scripts…

Speaking of Git, backup is not that straightforward thing. One of its essential functions is clone. You’re probably familiar with how clone works… obviously, it creates a local, fully functional copy of the repository. Every version of every file from the beginning of the project. It’s a nifty feature. Here’s the rundown: it constructs the project directory on our machine, establishes remote-tracking branches, executes the fetch operation (grabbing code for the aforementioned branches), and finally, performs a pull function for the default branch. Now all that’s left to do is to set the addresses for origin – which clone also does for us – and that’s it! Simple as that. You have the perfect copy of the cloned repository. Moreover, you can add a git bundle command to create a single archive file that contains all the refs needed to restore the repository, right?

As if that wasn’t enough, you can write your script that will automate everything. Ha! GitHub or Atlassian, provide their APIs, so can it get any easier?

Okay, let’s stop here for a while – it’s time to burst that bubble! Let’s look at the limitations of the approach based on git clone or your DIY scripts and the risks they pose.

Repositories-only limited copy

Having a clone allows you to restore the most recent version of your source code and repositories but it doesn’t include all the metadata, which as you know, include a lot of valuable and crucial information.

Such clones contain commits, trees, tags, branches, file blobs. Clone with – –mirror flag, includes also remotes and notes. However, issues, pull requests, comments, or wikis, are not included and protected at all. You won’t be able to restore it in the event of failure, accidental deletion, service outage, or any other unexpected risk.

Everyday risk of data loss…

The use of ‘git clone’ for backup purposes introduces a potential risk of data loss. Beyond the evident danger of excluding crucial metadata, this command can face challenges when the repository is actively used and your developers work on it regularly.

Note, that while the clone process runs, any new commits made might not synchronize with your local machine. There isn’t an inherent mechanism to lock the repository during a clone, which means that it is impossible to make a consistent point-in-time copy. Here we have another advantage of backup, which is an accurate reflection/copy/replica of your data (repositories and metadata) at the time of performing the backup task.

But that’s not the end. As you already know, human error has remained the leading cause of data loss for years. Your developers, even if you consider them highly talented, focused, and security-conscious geniuses, are deep down people who make mistakes – every day!

With no monitoring, copy retention policy, and detailed audit logs, you have no control over your clones.

Moreover, without a retention policy, you can’t restore data from any point in time which might be useful to roll back any mistakes that were by accident sent within a clone…

Branch deletion, removal of the old repository, losing a local copy, committed secrets, HEAD overwrite… just to name a few. Check how to avoid common developers’ mistakes.

Warning: No data restore & Disaster Recovery (!)

So much writing, and actually, these words would be enough – git clone doesn’t provide you with data restore and immediate Disaster Recovery capabilities to ensure business continuity in the event of failure.

In the event of a disaster, you would need some additional scripts for recovery, creating a never-ending cycle of making backups of backups. It is not a sustainable or efficient use of time and resources. Instead, it may be more worthwhile to invest in third-party tools that provide reliable and efficient backups and empower you with every-scenario-ready Disaster Recovery for DevOps and business continuity.

Want to find out more? Check out our DevOps Disaster Recovery use case!

We will leave the issues of management and financial resources related to writing your own scripts for the next DevSecOps MythBuster series.

Related articles:

Git backup or git clone? That is the question!
Your own Git backup script vs. repository backup software
How to write a GitHub backup script and why (NOT) to do it?

✍️ Subscribe to GitProtect DevSecOps X-Ray Newsletter – your guide to the latest DevOps & security insights

🚀 Ensure compliant DevOps backup and recovery with a 14-day free trial

📅 Let’s discuss your needs and see a live product tour

DEV Community

DevSecOps MythBuster – “Git Clone and DevOps Backup Script is all I need for data protection”

Repositories-only limited copy

Everyday risk of data loss…

Warning: No data restore & Disaster Recovery (!)

Top comments (0)

Read next

Zero-Downtime PostgreSQL Migration: My Journey with DevCycle 🚀

Clarifying "Coding Convention"

Protecting API Requests Using Nonce, Redis, and Time-Based Validation

Deploying a Globally Accessible Web Application with Disaster Recovery