Working on a GitOps framework around Kubernetes, I naturally run everything in containers. The two Dockerfiles that matter most for me, unfortunately, both have to download a lot of dependencies not included in the repository at build time. Which means the layer cache is crucial. Unfortunately, ephemeral CI/CD runners like GitHub Actions, start each run with an empty cache.
The first of the two Dockerfiles builds the image for the framework itself. This image is used for bootstrapping, automation runs and also disaster recovery. As such, it’s not your run-of-the-mill Dockerfile. It includes a number of dependencies installed from Debian packages, various Go binaries and last but not least the Python based CLIs of AWS, Azure and Google Cloud. It makes heavy use of multi-stage-builds and has different build stages for common dependencies and each Cloud provider’s specific dependencies. The layers of the final image also mirror the build stage logic.
Dockerfile number two is for the Kubestack website itself. The site is built using Gatsby and has to download a lot of node modules during the build. The Dockerfile is optimized for cache-ability and uses multi-stage builds to have a build environment based on NodeJS and a final image based on Nginx to serve the static build.
Build time for both, the framework image and the website image, heavily benefits from having a layer cache.
Docker has had the ability to use an image as the build cache using the --cache-from
parameter for some time. This was my preferred option because I need the ability to build and push images anyway. Storing the cache alongside the image is a no-brainer in my opinion.
For the website image the first step of my CI/CD pipeline is to pull the cache image. Note the || true
at the end to ensure a missing cache doesn’t prevent my build from running.
docker pull gcr.io/$PROJECT_ID/$REPO_NAME:latest-build-cache || true
Step two runs a build targeting the dev stage of my multi-stage Dockerfile and tags the result as the new build-cache.
docker build \
--cache-from gcr.io/$PROJECT_ID/$REPO_NAME:latest-build-cache \
--target dev \
-t gcr.io/$PROJECT_ID/$REPO_NAME:latest-build-cache \
.
The next step runs the actual build that produces the final image and tags it as well.
docker build \
--cache-from gcr.io/$PROJECT_ID/$REPO_NAME:latest-build-cache \
-t gcr.io/$PROJECT_ID/$REPO_NAME:$COMMIT_SHA \
.
Finally, the pipeline pushes both images.
docker push gcr.io/$PROJECT_ID/$REPO_NAME:latest-build-cache
docker push gcr.io/$PROJECT_ID/$REPO_NAME:$COMMIT_SHA
For a simple multi-stage build with only two stages, like my Gatsby website’s Dockerfile, this works pretty well.
But when I tried this for a project with multiple build stages, one for Python and one for JS, specifying two images under --cache-from
never seemed to work reliably. Which is double unfortunate, because having a layer cache here would save time not downloading Python and JS dependencies on every run.
Having cache pull and cache build steps for every stage also makes for a growingly verbose pipeline file the more stages you have.
So for the framework Dockerfile, I need something better.
Enter buildkit. Buildkit brings a number of improvements to container image building. The one’s that won me over are:
- Running build stages concurrently.
- Increasing cache-efficiency.
- Handling secrets during builds.
Apart from generally increasing cache efficiency, it also allows more control over caches when building with buildctl. This is what I needed. Buildkit has three options for exporting the cache. Called inline, registry and local. Local is not particularly interesting in my case, but would allow writing the cache to a directory. Inline includes the cache in the final image and pushes cache and image to the registry layers together. But this only includes the cache for the final stage in multi-stage builds. Finally, the registry option does allow pushing all cached layers of all stages into a separate image. This is what I needed for my framework Dockerfile.
Let’s take a look at how I’m using this in my pipeline. Having the cache export and import included in buildkit means I can reduce the three steps into one. And it also stays one step, no matter how many stages my Dockerfile has.
docker run \
--rm \
--privileged \
-v `pwd`/oci:/tmp/work \
-v $HOME/.docker:/root/.docker \
--entrypoint buildctl-daemonless.sh \
moby/buildkit:master \
build \
--frontend dockerfile.v0 \
--local context=/tmp/work \
--local dockerfile=/tmp/work \
--output type=image,name=kubestack/framework-dev:test-${{ github.sha }},push=true \
--export-cache type=registry,ref=kubestack/framework-dev:buildcache,push=true \
--import-cache type=registry,ref=kubestack/framework-dev:buildcache
This one command handles pulling and importing the cache, building the image, exporting the cache and pushing the image and the cache. By running the build inside a container, I also don’t have to worry about installing the buildkit daemon and cli. The only thing I needed to do was providing the .docker/config
to the build inside the container to be able to push the image and the cache to the registry.
For a working example, take a look at the Kubestack release automation pipeline on Github.
Using the cache, the framework image builds in less than one minute. Down from about three minutes before using buildkit without the cache export and import.
Top comments (5)
Great article, really helped.
Just wanted to point out that the mode=max parameter is necessary on the --export-cache line in order for the intermediate stages to be cached correctly:
--export-cache mode=max,type=registry,ref=kubestack/framework-dev:buildcache,push=true \
Philipp,
My build runs for about ten minutes and then I get the following error during the export.
I'm using the following for the build:
Any ideas on what is wrong? I should mention that the registry I'm using is AWS ECR. "############" means it was redacted.
It's looking like I'm running into AWS ECR not supporting cache manifest lists.
Switching to using Github's registry for the cache manifest and then pushing the final build images to AWS ECR worked.
how combine this with
RUN --mount=type=cache
?