DEV Community

Eoin Shanaghy
Eoin Shanaghy

Posted on

Find Changes Between Two Git Commits Without Cloning

There are some cases where you want to find out information about changes in your Git repository without having to clone the full repository. This will usually be in your automated build environment. When I used Jenkins, Travis or Circle CI, I had access to the cloned Git repository and could use git log, git ls-remote and git diff without any problem.

Other tools, and I am talking specifically about AWS CodeDeploy, take a different approach. Instead of giving you access to a cloned repo, AWS CodeDeploy gives you a snapshot of your code without the .git folder. This makes it impossible to run checks on what has changed since a previous build or even to determine what has changed in the commit that triggered your build. Some CI environments will give you a "shallow clone" without the full Git history, leaving you with a similar challenge.

I wanted to run these kind of checks to determine which microservices in our monorepo had changed so I knew which ones to build and redeploy. This is a technique described well in this Shippable blog post.

I looked at two options to find out folders which had seen changes since the last successful deployment:

  1. Clone the full repository manually in a CodeBuild step
  2. Use the GitHub API to retrieve information about the commits

The first option was one I wanted to avoid. It meant cloning a potentially large and growing repository at the start of the build. A shallow clone would not be sufficient as it would not capture the history of changes back to the previous release.

The GitHub REST API includes a compare API and a list-commits API. The compare API is limited to 250 commits so that couldn't be relied on. The get-commits API could work but it means making multiple paged requests for a large amount of data just to get the changed paths. After a bit of trial and error, I ultimately abandoned the GitHub API approach.

After some further digging, I came across a StackOverflow post that gave me a third option. It allows me to fetch the two individual commits using the git command and compare then to determine changed filenames. In this example, I'm using the public lodash/lodash repository. Assume we want to compare the changes between the tag 4.0.0 and the HEAD of the master branch, the sequence of commands looks like this:

git init .                                               # Create an empty repository
git remote add origin git@github.com:lodash/lodash.git   # Specify the remote repository

git checkout -b base                                     # Create a branch for our base state

git fetch origin --depth 1 4.0.0                         # Fetch the single commit for the base of our comparison
git reset --hard FETCH_HEAD                              # Point the local master to the commit we just fetched

git checkout -b target                                   # Create a branch for our target state

git fetch origin --depth 1 master                        # Fetch the single commit for the target of our comparison
git reset --hard FETCH_HEAD                              # Point the local target to the commit we just fetched

git diff --name-only base target                         # Print a list of all files changed between the two commits
Enter fullscreen mode Exit fullscreen mode

The directory size with this minimal fetching approach is 4.6M compared to 49M for the full lodash repository.

Comparing two git commits


I'm the CTO at fourTheorem. Follow me on twitter: @eoins

Oldest comments (1)

Collapse
 
tsmmark profile image
Mark Allen

Great post. However the preferred approach suggested in the article has some shortcomings — mainly, the diff will include files that changed in master, relative to your branch. Not the end of the world, but there may be a better way.

I found a way to deepen the shallow clone depth until the merge-base commit is found. Posted in a comment here: github.com/hasura/smooth-checkout-...

#!/bin/bash
set -euo pipefail

# From: https://stackoverflow.com/a/56113247/2696867
echo "--- git fetching shallow merge base"

# TODO: Consider using PR base branch $BUILDKITE_PULL_REQUEST_BASE_BRANCH, and default to master if no PR.
echo "Fetching commits until we find the merge-base / fork-point between current commit and master"
while [ -z $( git merge-base master $BUILDKITE_COMMIT ) ]; do
  echo "git fetch --deepen=50 origin master $BUILDKITE_COMMIT"
  git fetch --deepen=50 origin master $BUILDKITE_COMMIT
done

echo "Done."
Enter fullscreen mode Exit fullscreen mode

This allows you to have a shallow clone but still have it go as deep as you need to be able to git diff between current branch and base branch:

git --no-pager diff master... --name-only
Enter fullscreen mode Exit fullscreen mode