There are some cases where you want to find out information about changes in your Git repository without having to clone the full repository. This will usually be in your automated build environment. When I used Jenkins, Travis or Circle CI, I had access to the cloned Git repository and could use git log
, git ls-remote
and git diff
without any problem.
Other tools, and I am talking specifically about AWS CodeDeploy, take a different approach. Instead of giving you access to a cloned repo, AWS CodeDeploy gives you a snapshot of your code without the .git
folder. This makes it impossible to run checks on what has changed since a previous build or even to determine what has changed in the commit that triggered your build. Some CI environments will give you a "shallow clone" without the full Git history, leaving you with a similar challenge.
I wanted to run these kind of checks to determine which microservices in our monorepo had changed so I knew which ones to build and redeploy. This is a technique described well in this Shippable blog post.
I looked at two options to find out folders which had seen changes since the last successful deployment:
- Clone the full repository manually in a CodeBuild step
- Use the GitHub API to retrieve information about the commits
The first option was one I wanted to avoid. It meant cloning a potentially large and growing repository at the start of the build. A shallow clone would not be sufficient as it would not capture the history of changes back to the previous release.
The GitHub REST API includes a compare API and a list-commits API. The compare API is limited to 250 commits so that couldn't be relied on. The get-commits API could work but it means making multiple paged requests for a large amount of data just to get the changed paths. After a bit of trial and error, I ultimately abandoned the GitHub API approach.
After some further digging, I came across a StackOverflow post that gave me a third option. It allows me to fetch the two individual commits using the git
command and compare then to determine changed filenames. In this example, I'm using the public lodash/lodash
repository. Assume we want to compare the changes between the tag 4.0.0
and the HEAD
of the master
branch, the sequence of commands looks like this:
git init . # Create an empty repository
git remote add origin git@github.com:lodash/lodash.git # Specify the remote repository
git checkout -b base # Create a branch for our base state
git fetch origin --depth 1 4.0.0 # Fetch the single commit for the base of our comparison
git reset --hard FETCH_HEAD # Point the local master to the commit we just fetched
git checkout -b target # Create a branch for our target state
git fetch origin --depth 1 master # Fetch the single commit for the target of our comparison
git reset --hard FETCH_HEAD # Point the local target to the commit we just fetched
git diff --name-only base target # Print a list of all files changed between the two commits
The directory size with this minimal fetching approach is 4.6M compared to 49M for the full lodash
repository.
I'm the CTO at fourTheorem. Follow me on twitter: @eoins
Top comments (1)
Great post. However the preferred approach suggested in the article has some shortcomings — mainly, the diff will include files that changed in master, relative to your branch. Not the end of the world, but there may be a better way.
I found a way to
deepen
the shallow clone depth until the merge-base commit is found. Posted in a comment here: github.com/hasura/smooth-checkout-...This allows you to have a shallow clone but still have it go as deep as you need to be able to git diff between current branch and base branch: