There are some cases where you want to find out information about changes in your Git repository without having to clone the full repository. This will usually be in your automated build environment. When I used Jenkins, Travis or Circle CI, I had access to the cloned Git repository and could use
git ls-remote and
git diff without any problem.
Other tools, and I am talking specifically about AWS CodeDeploy, take a different approach. Instead of giving you access to a cloned repo, AWS CodeDeploy gives you a snapshot of your code without the
.git folder. This makes it impossible to run checks on what has changed since a previous build or even to determine what has changed in the commit that triggered your build. Some CI environments will give you a "shallow clone" without the full Git history, leaving you with a similar challenge.
I wanted to run these kind of checks to determine which microservices in our monorepo had changed so I knew which ones to build and redeploy. This is a technique described well in this Shippable blog post.
I looked at two options to find out folders which had seen changes since the last successful deployment:
- Clone the full repository manually in a CodeBuild step
- Use the GitHub API to retrieve information about the commits
The first option was one I wanted to avoid. It meant cloning a potentially large and growing repository at the start of the build. A shallow clone would not be sufficient as it would not capture the history of changes back to the previous release.
The GitHub REST API includes a compare API and a list-commits API. The compare API is limited to 250 commits so that couldn't be relied on. The get-commits API could work but it means making multiple paged requests for a large amount of data just to get the changed paths. After a bit of trial and error, I ultimately abandoned the GitHub API approach.
After some further digging, I came across a StackOverflow post that gave me a third option. It allows me to fetch the two individual commits using the
git command and compare then to determine changed filenames. In this example, I'm using the public
lodash/lodash repository. Assume we want to compare the changes between the tag
4.0.0 and the
HEAD of the
master branch, the sequence of commands looks like this:
git init . # Create an empty repository git remote add origin email@example.com:lodash/lodash.git # Specify the remote repository git checkout -b base # Create a branch for our base state git fetch origin --depth 1 4.0.0 # Fetch the single commit for the base of our comparison git reset --hard FETCH_HEAD # Point the local master to the commit we just fetched git checkout -b target # Create a branch for our target state git fetch origin --depth 1 master # Fetch the single commit for the target of our comparison git reset --hard FETCH_HEAD # Point the local target to the commit we just fetched git diff --name-only base target # Print a list of all files changed between the two commits
The directory size with this minimal fetching approach is 4.6M compared to 49M for the full