Patrick Sevat

Posted on Sep 26, 2020 • Edited on Oct 1, 2021

Optimize your git clone / fetch strategy for CI pipelines

#git #ci #pipeline

Over the last week I've been working hard on migrating our Frontend Monorepo from a on-premise Jenkins pipeline to a cloud based solution.

One of the big obstacles I had to overcome was dealing with the sheer size of our monorepo. If we include the whole history (all branches on remote, all tags, full history), the size of our repo amounts to more than 1GB.

Downloading that much data on each run is a waste of bandwith and time, so I've dived into the git documentation and found several possibilities that I will share with you.

You only need to checkout / build a single branch

`git clone --single-branch`

This simple flag (--single-branch) makes sure you only fetch the history of your main branch. The more active branches live on your remote, the bigger the benefit and with more than 400 active contributors in our repo, that helps!

`git fetch --no-tags`

Another simple flag! This one omits the tags from the git history. Tags are often version numbers such as <package-name>@1.2.67. Although not a huge data saver, it helps. Be careful though, some CI pipelines need tags, so double check if you need it.

`git fetch --depth=<n>`

Oh yes, now we are talking. This flag restricts the history to the latest commits. Really useful if you only care about the newest state of a certain branch. For example if you know you need to build a feature branch (or your main branch) and you don't need all it's history! Potential huge data savings 🤤.

Note: If your depth is quite large, the server will have to do a lot of calculations increasing transfer time. A depth of 50000 increased time by 300%. A depth of 1 cut time with 90%.

`git fetch --depth=1 origin <full_SHA1>`

Used when you need a single commit that is not at the tip of a branch, without pulling in all history of that branch. See this StackOverflow question for details

Combining the flags `git clone --no-tags --single-branch --depth=1`

If you just need the latest state of a single branch this is your command! You can also modify this command to checkout a specific branch by adding --branch=<branch-name> if you are building a branch that is not your main.

You need to more than one branch (but not all)

This is where things become a bit more tricky. In this scenario git clone might not be our best bet because it only allows for cloning all branches or one branch.

This was also the scenario which was most tricky for me, because I needed to determine a single "ancestor" commit where a feature branch started to branch of the main branch.

`git clone --depth=<n>`

This command fetches all active branches with limited history. This might work for you if you are interested in the most recent state of two (or more branches).

Downsides of this command is that it still fetches all branches and that determining "ancestor" commits is difficult/impossible (depends on depth, but can still be inaccurate).

Custom git config

This one might seem scary, but can actually be really effective. You have full control over the number of branches you want to track, if you would like tags and still control depth of the history

Steps

Initialize an empty git repository by running git init

Add the following snippet to your .git/config file

# .git/config
[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
[remote \"origin\"]
        # replace with your own repo!
     url = git@github.com:patricksevat/foo.git

        # list the branches that you want to track
     fetch = +refs/heads/main:refs/remotes/origin/main
        fetch = +refs/heads/development:refs/remotes/origin/development
        fetch = +refs/heads/feature:refs/remotes/origin/feature

# Replace main with master or whatever your main branch name is
[branch \"main\"]
        remote = origin
        merge = refs/heads/main

Now you can git fetch with whatever flags you like.
Do not add any flags for full history of the listed branches, or use --no-tags or --depth=1 to limit history.
Don't forget to run git checkout with one of your fetched branches or you'll have an empty working directory!

Conclusion

I hope these commands help you in optimizing your own git CI strategy. If you have any useful tips or improvements please leave them in the comments!

DEV Community

Optimize your git clone / fetch strategy for CI pipelines

You only need to checkout / build a single branch

`git clone --single-branch`

`git fetch --no-tags`

`git fetch --depth=<n>`

`git fetch --depth=1 origin <full_SHA1>`

Combining the flags `git clone --no-tags --single-branch --depth=1`

You need to more than one branch (but not all)

`git clone --depth=<n>`

Custom git config

Steps

Conclusion

Top comments (0)

Read next

Financial Sustainability in Open Source Projects: A Guide to Thriving

Navigating Open Source Project Sponsorship: A Guide for Success

New Study Reveals Optimal Resource Allocation for AI Model Distillation

Noisy Measurements Don't Stop AI from Finding Optimal Solutions, Study Shows

You only need to checkout / build a single branch

git clone --single-branch

git fetch --no-tags

git fetch --depth=<n>

git fetch --depth=1 origin <full_SHA1>

Combining the flags git clone --no-tags --single-branch --depth=1

You need to more than one branch (but not all)

git clone --depth=<n>

Custom git config

Steps

Conclusion

Read next

Financial Sustainability in Open Source Projects: A Guide to Thriving

Navigating Open Source Project Sponsorship: A Guide for Success

New Study Reveals Optimal Resource Allocation for AI Model Distillation

Noisy Measurements Don't Stop AI from Finding Optimal Solutions, Study Shows

`git clone --single-branch`

`git fetch --no-tags`

`git fetch --depth=<n>`

`git fetch --depth=1 origin <full_SHA1>`

Combining the flags `git clone --no-tags --single-branch --depth=1`

`git clone --depth=<n>`