This article covers "docker build replace", a script that I use in projects that contain Dockerfiles, which aims to help overcome some of the main drawbacks I encounter when using Dockerfiles in a project.
docker build command is great for helping to achieve reproducible builds for projects, where in the past developers had to rely on setting up the correct environment manually in order to get a successful build. One big drawback of
docker build, however, is that it can be very costly in terms of storage when running it multiple times, as each run of the command will generally leave unnamed images around. Cleanup can be straightforward, but requires continual pruning.
The need to remove unused images is particularly felt when trying to develop and debug Dockerfiles. Trying to come up with a minimal set of instructions that will allow you to run your processes the way that you want can require several
docker build runs, even after you've narrowed down the scope with an interactive
docker run session. Such a sequence may well require a few Docker image purges over the course of a session as your disk is continually overbooked by old and redundant images. This is compounded further if your Docker image makes use of a command such as
COPY . /src, where each change to your root project will require a new image build.
This is where
docker build --replace comes in, where Docker automatically removes the old image with the same tag when a new copy is built, and skips the build entirely if it's up-to-date. The only problem is that this flag doesn't currently exist.
docker_rbuild.sh ("Docker replace build") to approximate the idea of
docker build --replace by making use of the build cache:
# `$0 <img-name> <tag>` builds a docker image that replaces the docker image # `<img-name>:<tag>`, or creates it if it doesn't already exist. # # This script uses `<img-name>:cached` as a temporary tag and so may clobber # such existing images if present. if [ $# -lt 2 ] ; then echo "usage: $0 <img-name> <tag> ..." >&2 exit 1 fi img_name="$1" shift tag="$1" shift docker tag "$img_name:$tag" "$img_name:cached" &>/dev/null if docker build --tag="$img_name:$tag" "$@" ; then docker rmi "$img_name:cached" &>/dev/null # We return a success code in case `rmi` failed. true else exit_code=$? docker tag "$img_name:cached" "$img_name:$tag" &>/dev/null exit $exit_code fi
This tags the current copy of the image so that it can be reused for caching purposes, and then kicks off a new build. If the build was successful then the "cache" version is removed, theoretically meaning that only the latest copy of the image you're working on should be present in your system. If the build fails then the old tag is restored. If there are no updates then the cached layers are used to create a "new" image almost instantly to replace the old one.
With this, local images are automatically "pruned" as newer copies are produced, saving time and disk space.
One benefit of
docker_rbuild.sh is the fact that, now that
docker build isn't leaving redundant images around with each build, it is more practicable to use it in scripts to rebuild our images before we run them. This is useful when a project defines local images so that we can rebuild the image before it's used, every time that it's used, so that we're always using the latest version of the image without having to manually update it.
An example of where this can be convenient is when you want to use an external program or project that uses a language that isn't supported by your project. For example, the build process for this blog's content uses Node.js, but consider the case where I wanted to use a Markdown linter defined in Ruby, such as Markdownlint. One option is to add a Ruby installation directly to the definition of the build environment, but this has a few disadvantages:
- It adds an installation for a full new language to the build environment just to support the running of one program.
- It isn't clear, at a glance, that Ruby is only being installed to support one tool, and to someone new to the project it can look like the project is a combined Node.js/Ruby project.
- The above point lends itself to using more Ruby gems "just because" it's available, meaning that removing the Ruby installation later becomes more difficult.
One way to work around this is to encapsulate the usage with a Dockerfile, like
markdownlint.Dockerfile, and a script that runs the tool:
FROM ruby:3.0.0-alpine3.13 RUN gem install mdl ENTRYPOINT ["mdl"]
if [ $# -ne 1 ] ; then echo "usage: $0 <md-file>" >&2 exit 1 fi md_file="$1" proj='ezanmoto/hello' sub_img_name="markdownlint" sub_img="$proj.$sub_img_name" docker run \ --rm \ --volume="$(pwd):/app" \ --workdir='/app' \ "$sub_img" \ "$md_file"
This addresses some of the above issues:
- Ruby isn't installed directly into the build environment, meaning that the build environment is kept focused and lean.
markdownlint.Dockerfile, the Ruby installation is kept with the program that it's used to run, making the association clear.
- The entire Ruby installation can be removed easily by deleting
markdownlint.Dockerfile. This can be useful if we decide to replace the tool with a different linter, like this one written for Node.js. Another reason why we might remove
markdownlint.Dockerfileis if the external project starts maintaining its own public Docker image that can be used instead of managing a local version.
Despite the benefits, there are two subtle issues with this setup. The first is that
ezanmoto/blog_content.markdownlint will need to be built somehow before
markdownlint.sh can be run, which may be a manual process, and it would also be a surprising error to find out that an image is missing for a script.
The second issue is that if one developer builds the local image, and a second developer updates the image definition, the first developer will need to rebuild their copy of the local image before running
markdownlint.sh again or risk
We can solve both of these issues by running
docker_rbuild.sh before running
bash scripts/docker_rbuild.sh \ "$sub_img" \ 'latest' \ --file="$sub_img_name.Dockerfile" \ . docker run \ --rm \ --volume="$(pwd):/app" \ --workdir='/app' \ "$sub_img" \ "$md_file"
This causes the image to be always be rebuilt before it's used, meaning that we're always working with the latest version of the image, and this build step will most often be skipped due to caching (though attention should be paid to
the commands used in the image build, as the use of commands like
COPY can limit the effectiveness of the cache).
docker-compose particularly useful for modelling deployments. However, like developing Docker images, getting the
docker-compose environment correct can require continual fine-tuning of Docker images, especially for defining minimal environments. This can again result in lots of wasted space, especially when used with
docker-compose up --build.
With that in mind, I now remove the
build property from services defined in
docker-compose.yml. This then requires the images to be built before
docker-compose is called, which I normally handle in a script that will build all of the images used in the
docker-compose.yml file before the file is called:
version: '2.4' services: hello.seankelleher.local: image: nginx:1.19.7-alpine ports: - 8080:8080 volumes: - ./configs/hello.conf:/etc/nginx/conf.d/hello.conf:ro hello: image: ezanmoto/hello
set -o errexit proj='ezanmoto/hello' run_img="$proj" bash scripts/docker_rbuild.sh \ "$run_img" \ 'latest' \ . docker-compose up "$@"
Having an idempotent rebuild for Docker images means that it's more feasible to rebuild before each run, much in the same way that some build tools (e.g.
cargo) update any changed dependencies before attempting to rebuild the codebase. While Docker doesn't have native support for this at present, a script that takes advantage of the cache can be used to simulate such behaviour.