One Dockerfile is all it takes, falling in love with bake

#docker #monorepo #programming #tutorial

This was originally posted on my newsletter. Please subscribe!

I love docker buildx bake. Also, I was really proud of the newsletter post title and the reference to the Dua Lipa song.

I like experimenting with a lot of different ideas, and for the last several years, I've been trying to figure out how to maximize my ability to try new ideas with as little work as possible. In this post, I want to go over what I've settled on these days, which is to use a monolithic git repository, and to leverage Docker's bake functionality. We'll talk about how I structure my repository, how I build the containers, etc. If you'd just like to look through the repo, check it out at abatilo/newsletter-bake-monorepo.

For the backend services, I write web services with Go. I define a global go module at the root of my repository. Go workspaces are a thing as of Go 1.18 but their value is largely when you want to make it easy to work on module changes across repos. For frontend applications, I use create-react-app react applications. I'm not much of a frontend developer, and even though I know there are benefits to some of the newer frameworks, this one is comfortable to me which lets me focus on building. With the react applications, I do leverage npm workspaces. I serve the static assets directly from my Go application, which means that there's only one application container that I need to worry about. That also means that I can serve the frontend and backend on the same domain, which means I get to avoid any kind of CORS configuration. I serve the applications behind CloudFront as my CDN, and so there's still static asset caching, and backend applications get API acceleration, DDoS protection, and more thanks to the CDN. I organize building the containers using a docker-bake file. Bake is a high level build command available to people using docker buildx. In it's simplest definition, you can write a file in the Hashicorp terraform format, and running bake will build all of the targets, and will share common image stages when it can, and will traverse the dependency tree in parallel as much as possible. You can easily define several docker containers, and with a single command, build all of them, very comparable to having a Makefile.

With this high level overview, let's break it down.

Thanks for reading A slice of experiments! New posts every other week.

One Dockerfile is all it takes, falling in love with bake

Here’s a link to the whole Dockerfile. How’s this work? It’s a multi-stage build where we build the static assets in one stage, build the go binaries in another stage, and put everything together in the last stage.

The frontend stages:

FROM node:18.10.0-alpine as node-modules

WORKDIR /app/web
COPY ./web/package.json ./web/package-lock.json ./
COPY ./web/app1/package.json ./app1/
COPY ./web/app2/package.json ./app2/
RUN npm ci --no-audit

FROM node-modules as node-builder
COPY ./web ./

ARG app
RUN npm run build --workspace ${app}

So we start out with a stage that’s exclusively for installing dependencies. With npm, if you want deterministic builds, you want to use npm ci for your build environments. And with npm workspaces, you have to copy in the package.json for each workspace so that the npm ci can make sure to install everything needed. There are ways to install the dependencies for a single workspace, but since we’ll end up configuring our docker cache for the entirety of this stage, downloading the cache all at once actually ends up being significantly faster than downloading the dependencies for each workspace in parallel. Especially since there’s a lot of shared dependencies between each website. We keep the node-modules itself as an independent stage so that when we cache it, it doesn’t need to worry about the ARG for the app that we want to actually build. That’ll make more sense when we get to looking at the contents of the docker-bake.hcl

The backend stages:

FROM golang:1.19.0 as go-modules

# Install dependencies
WORKDIR /app
COPY ./go.mod ./go.sum ./
RUN go mod download

FROM go-modules as go-builder

COPY ./internal ./internal
COPY ./cmd ./cmd
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go install -ldflags="-w -s" ./cmd/...

Same as the frontend stages, we have a stage that’s for go dependencies, and we download them all at once and we’ll cache the entire stage. And as opposed to the frontend build, we actually just compile the entirety of the repo. The Go compiler is extremely fast, and in a future newsletter post that I have planned, we’ll talk about how to leverage the Go compiler build cache to make this even faster. Consider subscribing to find out how I do that in a Dockerfile!

The final stage:

FROM gcr.io/distroless/static-debian11:nonroot
ENV STATIC_ASSETS_PATH /static

ARG app
COPY --from=node-builder /app/web/${app}/build /static
COPY --from=go-builder /go/bin/${app} /usr/local/bin/entrypoint
ENTRYPOINT ["entrypoint"]

Just like I mentioned, we take the static assets for a specific app and the statically linked binary and put those into the same final container.

Writing your docker-bake.hcl

The common config:

variable "GITHUB_SHA" {
  default = "latest"
}

variable "REGISTRY" {
  default = "ghcr.io/abatilo/newsletter-bake-monorepo"
}

group "default" {
  targets = [
    "app1",
    "app2",
  ]
}

I use the GitHub SHA for my container tags but you can use whatever you want. Writing a variable block in the bake file will automatically be populated by any environment variables that are available when the docker buildx bake command is invoked. In GitHub Actions, GITHUB_SHA is always populated. Setting a default means that I get to build the containers locally without having to remember to set GITHUB_SHA.

The next variable block is for the container registry prefix to push to, without the image name. This one I don’t actually override anywhere but there isn’t a way to set local variables like in Terraform with locals blocks. So I define the registry in one place and we’ll use it for each application’s container target.

group “default” is the default target to build if all you do is run docker buildx bake without any additional arguments. I define all of the images in my project here.

Defining the cache targets:

target "node-modules" {
  target = "node-modules"
  cache-from = ["type=gha,scope=node-modules"]
  cache-to = ["type=gha,mode=max,scope=node-modules"]
}

target "go-modules" {
  target = "go-modules"
  cache-from = ["type=gha,scope=go-modules"]
  cache-to = ["type=gha,mode=max,scope=go-modules"]
}

Bake will build a dependency graph for any shared docker stages that get used by later images, and will walk the dependency graph to build the different targets. This is awesome for the monorepo and how I structured the Dockerfile. It means that I can define the dependency stages as their own images with the cache config, and when I run docker buildx bake, these two targets will be built first and since bake knows that the different apps depend on these targets, we build the dependencies first before them getting used in parallel for each application build. Buildkit has a backend for specifically using the GitHub Actions cache API which is incredibly easy to use. You can read more about this cache backend here.

target "app1" {
  contexts = {
    node-modules = "target:node-modules",
    go-modules = "target:go-modules"
  }
  args = {
    app = "app1"
  }
  tags = [
    "${REGISTRY}/app1:${GITHUB_SHA}",
  ]
  cache-from = ["type=gha,scope=app1"]
  cache-to = ["type=gha,mode=max,scope=app1"]
}

target "app2" {
  contexts = {
    node-modules = "target:node-modules",
    go-modules = "target:go-modules"
  }
  args = {
    app = "app2"
  }
  tags = [
    "${REGISTRY}/app2:${GITHUB_SHA}",
  ]
  cache-from = ["type=gha,scope=app2"]
  cache-to = ["type=gha,mode=max,scope=app2"]
}

Now, we have examples of the actual definitions for a single application’s docker container. Notice that we set the contexts key here which references the dependency targets we just defined. You can basically think of this like the depends_on block if you’ve used Terraform before. The args key let’s us populate the ARG variables in the Dockerfile. This is what is ultimately different between each of the containers. In the future, the buildx team might support using for_each loops like in Terraform but for now each block will have a bit of duplication.

What does CI look like?

There’s an amazing docker/bake-action which makes it insanely easy to build all of your containers in the most optimal way. Since we’ve set the group “default” block in the docker-bake.hcl, config is very minimal. One step in your GitHub Action workflow file will build all of your images and will push all of your cache layers, tag all of your containers, and push all your final images. You’ll still have to do things like checkout the code and don’t forget that you’ll want to use the docker/setup-buildx-action since bake is a buildx feature. There’s one quick gotcha for the actual docker/bake-action. We don’t want to push PR builds and we don’t want to pollute the cache with PR builds.

      - if: github.event_name == 'push'
        uses: docker/bake-action@v2.3.0
        with:
          push: true

      - if: github.event_name == 'pull_request'
        uses: docker/bake-action@v2.3.0
        with:
          push: false
          # Explicitly remove pushing to cache to not pollute the cache with PR
          # specific layers.
          set: |
            *.cache-to=

The way we set cache-to to an empty string is inspired by int128/docker-build-cache-config-action. The wildcard sets the cache-to property for all targets.

Wrapping up

Reminder that a full blown example repo is available at abatilo/newsletter-bake-monorepo. This project structure makes it so insanely easy to add new projects to CI and makes it easy for me to continue to try new ideas. Maintenance cost is very low since it’s a single Dockerfile and the Dockerfile itself is very minimal. These docker builds are optimized for small docker images, parallel builds, and let’s me try new ideas without much effort.

I hope you give docker bake a try. I think it’s a great way to maintain many images in the same repo and keep builds optimal and simple.

Thanks for reading A slice of experiments! New posts every other week