Kyle Galbraith for Depot

Posted on Nov 22, 2022 • Originally published at depot.dev

Faster Docker image builds in Cloud Build with layer caching

#docker #gcp #devops

The key to building Docker images quickly, across CI providers like Google Cloud Build, is to make use of the previous build's layer cache. There is the theory and best practice for creating a Dockerfile that takes advantage of layer caching by trying to get as many cache hits as possible during a build. But, in a CI environment, you need to have the layer cache available to the build for that work to pay off.

In this post, we are going to focus on how to build a Docker image as quickly as possible in Cloud Build by leveraging layer caching. We will benchmark build performance with caching using the docker executor, kaniko executor, and our own depot service.

Building Docker images in Cloud Build

Getting an image built inside of Cloud Build can be done with a single step inside a cloudbuild.yml file. Here is an example where we are building a Node application that has the following Dockerfile:

FROM node:16 AS build

WORKDIR /app
COPY package.json yarn.lock tsconfig.json ./
COPY src/ ./src/
RUN yarn install --immutable
RUN yarn build

FROM node:16
WORKDIR /app
COPY --from=build /app/node_modules /app/node_modules
COPY --from=build /app/dist /app/dist
ENV NODE_ENV production
CMD ["node", "--enable-source-maps", "./dist/index.js"]

To build this image we add a cloudbuild.yml file to the root of the repository with the following contents:

steps:
  - name: gcr.io/cloud-builders/docker
    args:
      - build
      - -t
      - us-west1-docker.pkg.dev/$PROJECT_ID/depot-demo/demo:$COMMIT_SHA
      - -t
      - us-west1-docker.pkg.dev/$PROJECT_ID/depot-demo/demo:latest
      - .

images:
  - us-west1-docker.pkg.dev/$PROJECT_ID/depot-demo/demo:$COMMIT_SHA
  - us-west1-docker.pkg.dev/$PROJECT_ID/depot-demo/demo:latest

This configuration is telling Cloud Build to build an image using the docker builder image with two different tags, one for the commit SHA and the other for latest. Then the images block tells the build to push those resulting images to artifact registry. Running the build we see the total build takes 1 minute and 40 seconds, with the image build portion taking ~78 seconds.

If you run the build a second time you will notice that the image build is again, approximately 78 seconds. Why? Because we aren't doing anything to make use of the previous builds cache. We can add that by updating our cloudbuild.yml file to the following:

steps:
  - name: gcr.io/cloud-builders/docker
    entrypoint: bash
    args:
      - -c
      - docker pull us-west1-docker.pkg.dev/$PROJECT_ID/depot-demo/demo:latest || exit 0

  - name: gcr.io/cloud-builders/docker
    args:
      - build
      - -t
      - us-west1-docker.pkg.dev/$PROJECT_ID/depot-demo/demo:$COMMIT_SHA
      - -t
      - us-west1-docker.pkg.dev/$PROJECT_ID/depot-demo/demo:latest
      - --cache-from
      - us-west1-docker.pkg.dev/$PROJECT_ID/depot-demo/demo:latest
      - .

images:
  - us-west1-docker.pkg.dev/$PROJECT_ID/depot-demo/demo:$COMMIT_SHA
  - us-west1-docker.pkg.dev/$PROJECT_ID/depot-demo/demo:latest

This is the easiest way to leverage the build cache of a previous build in Cloud Build. Here we are pulling down the latest tag for the image we are building in the first step. We pull that tag down so that we can make use of it in the --cache-from flag in the second step. This is what allows us to utilize the build cache from the previous build because we tag all new images with the latest tag.

So, what are the results now? The first build took about 78 seconds to build the image and the second build takes ~38 seconds. A nice improvement, but if you look closely, something doesn't look right.

Did you spot it? We got the docker build portion down to 38 seconds, but the entire build still took ~78 seconds. Why is that? Well, it's because pulling the latest tag to use in the --cache-from takes time to transfer that image from the registry to your build so that it can be used for caching. In this case, that took 25 seconds and has negated any benefit we could have seen from using the layer cache in total build time.

Building Docker images in Cloud Build with Kaniko

kaniko is a tool that allows you to build container images inside Kubernetes without the need for the Docker daemon. Effectively, it allows you to build Docker images without docker build.

We can actually change our cloudbuild.yml file to use kaniko instead of the docker builder image. With the Kaniko executor in Cloud Build, we can specify a --cache flag that allows us to store our Docker layer cache in Container Registry. Here is the updated cloudbuild.yml file:

steps:
  - name: gcr.io/kaniko-project/executor:latest
    args:
      - --destination=gcr.io/$PROJECT_ID/depot-demo/demo
      - --cache=true
      - --cache-ttl=24h

If we run a build with this configuration, we see the following results:

The entire build took 2 minutes and 30 seconds, and the image build portion took 2 minutes and 19 seconds of that. That's not ideal, but maybe build performance will be better for the next build because we can make use of the layer cache via Kaniko. Let's run the build again and see what happens:

On the second run, the image build is now ~69 seconds and the entire build is 79 seconds. An improvement over the previous run because we get to make use of caching, but we aren't seeing any improvement over our Docker builder approach. In fact, the total time is effectively the same and the image build is slower. To recap, here are the results we have seen so far:

	total time	image build time
with `docker` builder (no cache)	100s	78s
with `docker` builder (with cache)	78s	38s
with `kaniko` builder (no cache)	150s	139s
with `kaniko` builder (with cache)	79s	69s

Faster Docker image builds in Cloud Build with Depot

We've observed that using the Docker layer cache across builds speeds up build times significantly. But, as we saw, the current approach for doing that in Cloud Build can negate any performance gains because of network latency. The image build might take 38 seconds, but the entire build still takes a total of 78 seconds to complete because it takes another 25 seconds to pull down the latest tag to use for caching.

What if we could make use of the layer cache without the network penalty? That is where Depot comes in. Depot provides remote container builders on cloud VMs. They come with more resources, 4 CPUs and 8 GB memory, as well as a 50 GB persistent SSD cache. A large, fast, and persistent disk allows us to share layer cache across builds automatically, without spending any time transferring the cache for the build.

We can use Depot to build our image in Cloud Build by using the depot builder image. Here is the updated cloudbuild.yml file:

steps:
  - id: Build with Depot
    name: ghcr.io/depot/cli:latest
    args:
      - build
      - --project
      - <your-depot-project-id>
      - -t
      - us-west1-docker.pkg.dev/$PROJECT_ID/depot-demo/demo:$COMMIT_SHA
      - .
      - --push
    env:
      - DEPOT_TOKEN=${_DEPOT_TOKEN}

This configuration uses the depot builder image to build the image. The --project flag routes your build to your Depot project and the remote builders that back it, using the DEPOT_TOKEN environment variable to authenticate the build to your project. Note, the token used here is a project token that can be created under your project settings.

If we run our first build using this configuration, we see an output like the one seen below:

We can see that the first build is uncached and takes a total of 73 seconds to complete, with 64 seconds of that being the image build. Things are already faster when looking at total build time, than any of the other previous options. Let's run a second build that leverages the persistent SSD cache on the remote builders. We don't have to make any changes to leverage the layer cache, it's already on the remote builder.

The entire build took 28 seconds, and the image build portion was 15 seconds. Depot is over 2x faster at building Docker images inside of Google Cloud Build than any of the previous options we looked at.

	total time	image build time
with `docker` builder (no cache)	100s	78s
with `docker` builder (with cache)	78s	38s
with `kaniko` builder (no cache)	150s	139s
with `kaniko` builder (with cache)	79s	69s
with `depot` builder (no cache)	73s	64s
with `depot` builder (with cache)	28s	15s

Conclusion

In this post, we have seen the variety of different ways that you can build Docker images in Google Cloud Build. A fundamental component to making image builds fast is making use of the layer cache from previous builds.

As we saw, with docker we can use the layer cache by pulling down the previous image and using it as a cache source. This produced an image build that was twice as fast, but the total build time remained roughly the same because of the network penalty of pulling down the previous image.

With kaniko we got the ability to use the layer cache by persisting it to Container Registry. But the image build, for cached and uncached, wasn't any faster than using the docker builder image. Kaniko caching is slower as it snapshots the filesystem after each layer and there is still a network penalty being paid to transfer the layer cache from Container Registry.

With depot we get the ability to use the native Docker layer cache without the network penalty. The image build is over 2x faster than using the docker approach and almost 3x faster than using the kaniko approach. The layer cache is persisted to a fast SSD on the remote builders, allowing subsequent builds to be faster by using it automatically.

Are you interested in trying Depot for your own projects? We offer a 14-day trial! Get started for free 🎉

Top comments (3)

cloutierjo • Nov 24 '22

Using cache is great to improve build time, but then it won't use the last updated base image. Do you have any solution to ensure we get those update when available/needed?

Kyle Galbraith Depot • Nov 29 '22 • Edited

Awesome question! It sounds like your referring to referring to things like FROM ubuntu:latest, where you want to make sure you get the latest base image. The latest image will still be pulled in a build if it has changed and then the layer cache will go to work for subsequent steps. Here is an example Dockerfile.

FROM node:16 AS build

WORKDIR /app
COPY package.json yarn.lock tsconfig.json ./
COPY src/ ./src/
RUN yarn install --immutable
RUN yarn build

FROM node:16
WORKDIR /app
COPY --from=build /app/node_modules /app/node_modules
COPY --from=build /app/dist /app/dist
COPY gifs-to-upload/ dist/gifs-to-upload/
ENV NODE_ENV production
CMD ["node", "--enable-source-maps", "./dist/index.js"]

If I run that build with depot, you see something like this at the start of the build.

[+] Building 3.1s (15/15) FINISHED
 => [depot] launching arm64 builder                                                                                                                                 0.5s
 => [depot] connecting to arm64 builder                                                                                                                             0.4s
 => [internal] load .dockerignore                                                                                                                                   0.4s
 => => transferring context: 116B                                                                                                                                   0.3s
 => [internal] load build definition from Dockerfile                                                                                                                0.4s
 => => transferring dockerfile: 422B                                                                                                                                0.3s
 => [internal] load metadata for docker.io/library/node:16                                                                                                          0.3s
 => [stage-1 1/5] FROM docker.io/library/node:16@sha256:68fc9f749931453d5c8545521b021dd97267e0692471ce15bdec0814ed1f8fc3                                            0.0s
 => => resolve docker.io/library/node:16@sha256:68fc9f749931453d5c8545521b021dd97267e0692471ce15bdec0814ed1f8fc3                                                    0.0s
 => [internal] load build context                                                                                                                                   0.2s
 => => transferring context: 421B                                                                                                                                   0.2s
 => CACHED [stage-1 2/5] WORKDIR /app                                                                                                                               0.0s

The key bit is the load metadata for docker.io/library/node:16 which is followed by a FROM with a sha hash of that image tag. If we change the FROM from node:16 to node:16.8.1, you can see that the new base image gets pulled but the layer cache still takes effect.

[+] Building 3.0s (15/15) FINISHED
 => [depot] launching arm64 builder                                                                                                                                 0.4s
 => [depot] connecting to arm64 builder                                                                                                                             0.4s
 => [internal] load build definition from Dockerfile                                                                                                                0.3s
 => => transferring dockerfile: 432B                                                                                                                                0.3s
 => [internal] load .dockerignore                                                                                                                                   0.3s
 => => transferring context: 116B                                                                                                                                   0.3s
 => [internal] load metadata for docker.io/library/node:16.18.1                                                                                                     0.2s
 => [internal] load build context                                                                                                                                   0.2s
 => => transferring context: 421B                                                                                                                                   0.2s
 => [build 1/6] FROM docker.io/library/node:16.18.1@sha256:68fc9f749931453d5c8545521b021dd97267e0692471ce15bdec0814ed1f8fc3                                         0.0s
 => => resolve docker.io/library/node:16.18.1@sha256:68fc9f749931453d5c8545521b021dd97267e0692471ce15bdec0814ed1f8fc3                                               0.0s
 => CACHED [build 2/6] WORKDIR /app                                                                                                                                 0.0s

Essentially the Docker layer cache is formed by the ADD, RUN, and COPY statements. So the latest base images are always pulled if they have changed.

Kamal Mustafa • Dec 9 '22

This link seems dead - dev.to/blog/fast-dockerfiles-theor..., or there's typos?