Build performance optimization (2 Part Series)
Abstract: In this post, I experimented with 6 different approaches for caching Docker builds in GitHub Actions to speed up the build process and compared the results. After trying out every approach, 10 times each, the results show that using GitHub Packages’ Docker registry as a build cache, as opposed to GitHub Actions’ built-in cache, yields the highest performance gain.
Sick of waiting for builds on GitHub Actions. 😖
Unlike self-hosted runners like Jenkins, most cloud-hosted build runners are stateless, providing us with a pristine environment each run. We cannot keep files from the previous runs around; anything that needs to be persisted must be externalized.
GitHub Actions has a built-in cache to help do this. But there are many ways of creating that cache (
docker save and
docker load first comes to mind). Will the performance gains outweight the overhead caused by saving and loading caches? Are there more approaches other than using GitHub Action’s built-in cache? That’s what this research is about.
To conduct this experiment, I first came up with some ideas on how we might cache the built Docker image. I experimented with 8 total approaches.
- No cache 🤷♂️
This is the baseline.
- No cache (BuildKit) 🤷♀️
This is another baseline, but with BuildKit.
Once an image is created, we can use
docker save to export the image to a tarball and cache it with
actions/cache. On subsequent runs, we can use
docker load to import the image from the cached tarball. Then we can build the image with the
Note that this method is not compatible with BuildKit as it only supports external caches located on the registry (not pulled/imported images).
- Cache /var/lib/docker with
Maybe we don’t need to do all that import-export work and can just cache the whole
- Use a local registry with
We can run a filesystem-backed Docker registry on localhost, push the built image to that registry, and cache it with
actions/cache. On subsequent runs, we can pull the latest image from the localhost registry and use the pulled image with
- Use a local registry with
Same as above, but with BuildKit.
There is a minor difference in the setup: With BuildKit we do not have to run
docker pull. BuildKit already expects the cached image to be located on a registry. While this may seem like a limitation, but it also comes with a performance gain: if a cache turns out to be not applicable, BuildKit will skip pulling the image.
- Use GitHub Packages’ Docker registry as a cache 🐙
GitHub provides a package registry already, so we might not need to run our own filesystem-backed registry at all!
- Use GitHub Packages’ Docker registry as a cache (BuildKit) 🦑
Same as above, but with BuildKit.
Approach 🤯, Cache /var/lib/docker with
actions/cache, failed because of a lot of permission errors caused when generating the cache. Even with a
sudo chmoda proper cache could be created.
Approach 🦑, Use GitHub Packages as a cache (BuildKit), failed because as of the time of writing this, GitHub Package Registry does not support requesting a cache manifest. Subsequent builds will cause this message when trying to re-use a cached image:
ERROR: httpReaderSeeker: failed open: could not fetch content descriptor
So, we are left with 4 approaches (📦, 🐳, 🐋, 🐙), excluding 2 baselines (🤷♂️ and 🤷♀️).
When optimizing builds, although we can reduce the time to run
docker build 🤩, but unfortunately some overhead will be introduced 😞.
Overhead time includes:
- restoring a saved cache (approaches 📦, 🐳, 🐋);
- importing image from the tarball loaded from the cache (approach 📦);
- running a local Docker registry (approaches 🐳, 🐋);
- pulling from a Docker registry (approaches 🐳, 🐋, 🐙);
- exporting the built image to a tarball (approach 📦);
- pushing an image to a registry (approaches 🐳, 🐋, 🐙); and
- saving a cache (approaches 📦, 🐳, 🐋).
From my experience,
- The build time grows mostly with project complexity, i.e. the CPU time to build.
- The overhead time grows mostly with the size of things being cached, i.e. the IO time to serialize and transmit stuff.
For this experiment, we want to have an image that’s large enough, such that both the build time and overhead become apparent in our measurements.
This resulted in this Dockerfile which generates a somewhat bloated image 🤪:
FROM node:12 RUN yarn create react-app my-react-app RUN cd my-react-app && yarn build RUN npm install -g @vue/cli && (yes | vue create my-vue-app --default) RUN cd my-vue-app && yarn build RUN mkdir -p my-tests && cd my-tests && yarn add playwright
I created a GitHub Actions workflow which runs all the approaches. I ran them 20 times.
- 10 times, each time changing the Dockerfile, invalidating the cache (
actions/cache). This is the cache MISS scenario.
- 10 times, each time keeping the Dockerfile the same. This is the cache HIT scenario.
After that, the results for each approach and scenario will be averaged together.
There are few exceptions:
- The baselines don't have a cache. So all 20 builds count as cache MISS.
- Approach 🐙 uses an external registry, and hence images are cached and invalidated on a layer-by-layer basis. Due to the way I set up this experiment, all 20 builds ended up being a cache HIT. I am too lazy to perform 10 more builds, so I will assume the cache MISS situation is same as the no cache + the time it takes to push the image the first time.
After all the builds are run, I used the GitHub Actions API to retrieve the timing for each step. I threw the results into Google Sheets, ran some PivotTable, and summarized the results.
The green bar represents the time it takes in cache HIT scenario. The red bar represents the extra time it costs for the cache MISS scenario.
You can see that just using BuildKit without any cache gives some performance gain. One whole minute saved.
The cache size is 846 MB. It turns out that
docker save is very slow. So, uploading that tarball into the cache also takes a long time.
The cache size is 855 MB. It turns out that doing a
docker push to a filesystem-backed registry, running in localhost, inside Docker, turned out to be faster than
docker save. The same is true for
docker pull vs
The cache size is 855 MB. When you run
docker build, BuildKit pulls the image from the local registry, so there is no separate
docker pull step, compared to the approach 🐳.
Although we cannot take advantage of BuildKit, using an external registry means that builds can be cached on a layer-by-layer basis. Out of all approaches I’ve tried, this approach gives the best result. 🏆
However, you also get a “build-cache” package listed in your repository’s published packages. 😂
To test out the potential gain, I tried running
docker-compose build on DEV Community’s repository. Without any caching, building the web image took 9 minutes and 5 seconds. Using GitHub Package Registry as a cache, the time to build the image has reduced to 37 seconds.
You can find the code implementing each of these approaches here:
When it comes to build performance improvement, there are many articles, each recommending different approaches using different examples. It’s hard to see how each approach fares against one another.
While working on this, I also noticed trying other different approaches together gives way to more novel approaches being discovered. For example, had I not tried out Approach 🐳 (local Docker registry), I wouldn’t have thought “Oh! GitHub Package Registry is a thing! Can I use that and skip Action’s cache entirely? Let’s try!” and came up with Approach 🐙.
So, I’d say this is the kind of article I wish existed when I wanted to set up a CI workflow with Docker. I wish we have more showdowns for more approaches and more tools.
Thanks for reading!