Skip CI stages in Travis based on what files changed.

#ci #git #travisci

At my new job, I've ended up partially responsible for the CI pipeline for our flagship product. One of the big complaints from both our internal devs and our FOSS contributors is how long CI builds (and thus the CI checks on pull requests on GitHub) take to complete.

Digging into this, I pretty quickly realized that the main culprit was our unit testing, which includes stress tests for one of the components which take almost 15 minutes to run, even if the build is for a PR or commit that doesn't even touch that code.

So, I set about looking into how to skip those tests when the changes being tested had nothing to do with that component.

A First Look

We currently use Travis CI to do our CI/CD builds. Travis actually provides you a rather significant amount of control over what jobs run as part of a build based on what you're doing. You can match on quite a few things, including stuff like whether the build is for a PR or not, what distribution the build is being done on, and even whether the repo the build is for is a fork or not.

However, there's no way from the conditionals to check anything about the contents of the branch being built (you can match on branch name or commit message, though those aren't that useful for our purposes). Given this, we need to do the checks inside the scripts for the unit testing using data from the build environment.

A Matter of Depth

Before going further into figuring out how to do this, there's a rather important detail about Travis that needs to be taken into account. By default, when Travis clones a Git repository, it does a shallow clone with only the most recent commit. This provides a significant performance improvement to the cloning process when dealing with large repos, but it means that you can't find any information about anything except the most recent commit, which is an issue when trying to determine what files have changed.

Thankfully, disabling this and getting a deep clone of the source repo with the full history is rather easy, you just need to add the following to any job that you need to inspect the history with:

git:
  depth: false

You could also specify a number between 1 and 50 (Travis imposes this 50 commit limit, not git) instead of false to get a clone with that number of commits. For most use cases, just specifying 50 there should be more than enough. However, the performance difference between fetching the 50 most recent commits and the whole repo is rather minimal unless you have an absolutely huge history on the branch you're checking against, so it's safer to just disable the shallow cloning altogether.

The Build Environment

Back to the task at hand, Travis provides a huge amount of info in the build environment in the form of environment variables (see the full list here). This includes a lot of the same stuff that you can match in conditionals, but also includes quite a bit more than that, including such things as what compiler is being used, what CPU architecture is being used, and even the URL for the build logs.

Rather helpfully, there is an environment variable provided by Travis that tells you what range of commits you're testing with this build, called TRAVIS_COMMIT_RANGE. This provides the commit range being tested as two shortened commit hashes separated by .... Based on this, if you're using git for version control, the following snippet of bash code will give you a list of all the files that changed for this build, one per line, in the variable CHANGED_FILES:

# `git` expects a `..` delimiter between commits, not `...`
COMMIT_RANGE="$(echo ${TRAVIS_COMMIT_RANGE} | cut -d '.' -f 1,4 --output-delimiter '..')"
CHANGED_FILES="$(git diff --name-only ${COMMIT_RANGE} --)"

Once you have that, you can easily use any of a number of tools to check the list of changed files against a set of matching conditions to decide on what to run.

A Few Small Issues

The above code works great most of the time. However, there are three caveats to using TRAVIS_COMMIT_RANGE that can cause the above code to fail:

If the build is for a newly pushed branch, TRAVIS_COMMIT_RANGE will be empty.
If the build is for a branch that just had it's history rewritten (for example, due to a rebase and force push), the starting commit may not actually be accessible in the repo anymore.
TRAVIS_COMMIT_RANGE only lists the range of commits for the most recent push. This means that if a user incrementally updates a PR by pushing to the branch it's based on, the above won't give you the full list of files for the PR, just the most recent push.

The third is actually the easiest to solve, so we'll cover that first.

Looking at the whole PR

In the case of a PR build, the environment variable TRAVIS_PULL_REQUEST will be set equal to the PR number on GitHub. For other builds, it's set to the exact string false. Thus, checking if a build is a PR or not is as easy as checking if TRAVIS_PULL_REQUEST is not equal to the string false.

Once we know we're dealing with a PR build, there are two additional facts we can rely on to determine what commits are part of the PR:

The currently checked out revision (HEAD in git terms) will be the most recent commit in the PR.
The the name of the branch the PR is against (that is, the one it will be merged into) can be found in the TRAVIS_BRANCH environment variable.

Based on this, we can update the above code like so to inspect the entirety of a PR when dealing with a PR build:

CHANGED_FILES=

if [ "${TRAVIS_PULL_REQUEST}" = "false" ] ; then
    # This isn't a PR build.
    COMMIT_RANGE="$(echo ${TRAVIS_COMMIT_RANGE} | cut -d '.' -f 1,4 --output-delimiter '..')"
    CHANGED_FILES="$(git diff --name-only ${COMMIT_RANGE} --)"
else
    # This is a PR build.
    CHANGED_FILES="$(git diff --name-only ${TRAVIS_BRANCH}..HEAD --)"
fi

The Challenging Cases

The other two issues are a bit more complicated. Both of them also functionally require looking at the 'whole' branch instead of the most recent commits, but unlike a PR we don't have a well defined base branch to check against, and on top of that we don't know exactly where the branch diverged.

However, detecting these cases is actually pretty easy.

As mentioned above, if the build is for a new branch (one Travis has never seen before), TRAVIS_COMMIT_RANGE will be empty. That's trivial to check with a simple [ -z "${TRAVIS_COMMIT_RANGE}" ].

The case of a history rewrite is a bit trickier, because we need to figure out if git recognizes both the starting and ending commit as actual commits. Luckily, this can be done with a single git command: git cat-file -t $foo, where $foo is the commit hash we want to check. If the hash represents an actual commit, that command will return the exact string 'commit', which we can also easy check for.

Putting that all together, we get:

CHANGED_FILES=

if [ -z "${TRAVIS_COMMIT_RANGE}" ] ; then
    # This is a new branch.
else
    # This isn't a new branch.
    if [ "${TRAVIS_PULL_REQUEST}" = "false" ] ; then
        # This isn't a PR build.

        # We need the individual commits to detect force pushes.
        COMMIT1="$(echo ${TRAVIS_COMMIT_RANGE} | cut -f 1 -d '.')"
        COMMIT2="$(echo ${TRAVIS_COMMIT_RANGE} | cut -f 4 -d '.')"

        if [ "$(git cat-file -t ${COMMIT1} 2>/dev/null)" = commit -a "$(git cat-file -t ${COMMIT2} 2>/dev/null)" = commit ] ; then
            # This was a history rewrite.
        else
            # This is a 'normal' build.
            CHANGED_FILES="$(git diff --name-only ${COMMIT_RANGE} --)"
        fi
    else
        # This is a PR build.
        CHANGED_FILES="$(git diff --name-only ${TRAVIS_BRANCH}..HEAD --)"
    fi
fi

Once you've got the detection done, there are three approaches that can be taken here:

Just assume that everything you might do in this job needs to be done and move on with things. This is the simplest and most reliable option overall. You can find an example of a full script doing this here.
Assume a specific 'base' branch to compare against and use git merge-base to find the 'starting' commit to work from. This has some reliability issues because git merge-base can be unpredictable under certain circumstances, and won't account reliably for changes that happened in the base branch relative to your current branch.
Assume a specific 'base' branch to compare against, and then use the same logic as was used for PR builds. This is more reliable and predictable than using git merge-base, but it can still miss things on occasion.

Which approach you take really depends on what you're trying to do conditionally. If it's just something like unit tests, then the first approach (just assume you need to do everything) is generally fine. However, if you're using this for conditional deployment, you're probably going to want one of the other two approaches so that you don't redeploy things you don't need to.

Top comments (2)

Marc • Jul 24 '20

TRAVIS_COMMIT_RANGE only lists the range of commits for the most recent push. This means that if a user incrementally updates a PR by pushing to the branch it's based on, the above won't give you the full list of files for the PR, just the most recent push. The third is actually the easiest to solve, so we'll cover that first.

As of July 2020 that is not true. I just tested this and for a Pull Request ${TRAVIS_COMMIT_RANGE/.../..} is strictly identical to ${TRAVIS_BRANCH}..HEAD no matter what you (force) push to your submitted branch and how. So it may be possible to simplify that particular if PR/then/else.

Also, you want to keep the triple dots ... in TRAVIS_COMMIT_RANGE for git diff, they're very important and very different from double dots. Replace them with double dots .. only for the git log command which is inconsistent with git diff, see explanation at github.com/travis-ci/travis-ci/iss...

PS: Travis documentation seems to fall very short here.