Docker is the tool we use every day in our development, but how much time do you waste waiting for Docker build to complete? And how do you deal with gigantic image size?
What if I tell you there's a better way to build your containers?
Your favorite next tool is called Buildkit!
In this tutorial, we'll dive into the advanced usage of Docker to optimize your development process either in build time and in the size of the image itself. We will do it using Buildkit parallel multistage builds.
Buildkit
Buildkit is a toolkit developed by the Moby project to enhance the build and the packaging of software using containers.
Main features
Among the different features, Buildkit offers automatic garbage collection to clean up unneeded resources, concurrent dependency resolution, and efficient instruction caching. Buildkit is part of docker build
since Docker 18.06.
How to enable Buildkit
If you want to use the Buildkit powered build engine you can do it using the environment variable DOCKER_BUILDKIT=1 docker build
.
It's also possible to enable Buildkit by default:
- Edit the daemon configuration in
/etc/docker/daemon.json
and add
{ "features": { "buildkit": true } }
- Restart the daemon with
sudo systemctl daemon-reload
sudo systemctl restart docker
Code example
For this tutorial, we are going to prepare an image to deploy an instance of Prometheus in production. We will start from a standard Dockerfile and we will refactor it to improve performances.
Legacy Dockerfile
We are going to build Prometheus from source code, to do that we need a Docker image with all its build dependencies: golang
, nodejs
, yarn
, and make
.
FROM ubuntu:bionic
ENV GOPATH=$HOME/go
ENV PATH=$PATH:/usr/local/go/bin:$GOPATH/bin
RUN apt-get update \
&& apt-get install -y curl git build-essential \
&& curl -sL https://deb.nodesource.com/setup_14.x | bash - \
&& apt-get install -y nodejs \
&& npm install -g yarn \
&& curl -O https://storage.googleapis.com/golang/go1.15.2.linux-amd64.tar.gz \
&& tar -xvf go1.15.2.linux-amd64.tar.gz \
&& mv go /usr/local \
&& git clone https://github.com/prometheus/prometheus.git prometheus/ \
&& cd prometheus/ \
&& make build
# RUN ./prometheus --config.file=your_config.yml
and let's build it with:
$ time docker build --no-cache -t prometheus . -f Dockerfile.prometheus
...
Successfully built 54b5d99ef76a
Successfully tagged prometheus:latest
real 19m56,395s
user 0m0,506s
sys 0m0,334s
The image size is:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
prometheus latest 54b5d99ef76a 25 minutes ago 2.38GB
Legacy build performance
Looking at the results we needed almost 20 minutes to create an instance of Prometheus that has a size of 2.38GB.
This will be our starting point.
Multistage build
Now we have an image ready for production, so we are happy, right?
No, we definitely are not
As you may have noticed, the image we've just created is huuuge, we can definitely do better using an advanced Docker feature called multistage build.
The multistage build is available in Docker since the 17.05 version and it is the go-to way to optimize image size. You can use the FROM ... AS ...
instruction to define a build stage and the COPY --from
instruction to share artifacts between stages.
Refactor legacy Dockerfile to use multistage build
Let's apply these concepts to the old Dockerfile.
FROM ubuntu:bionic as base-builder
ENV GOPATH=$HOME/go
ENV PATH=$PATH:/usr/local/go/bin:$GOPATH/bin
RUN apt-get update \
&& apt-get install -y curl git build-essential \
&& curl -sL https://deb.nodesource.com/setup_14.x | bash - \
&& apt-get install -y nodejs \
&& npm install -g yarn \
&& curl -O https://storage.googleapis.com/golang/go1.15.2.linux-amd64.tar.gz \
&& tar -xvf go1.15.2.linux-amd64.tar.gz \
&& mv go /usr/local \
&& git clone https://github.com/prometheus/prometheus.git prometheus/ \
&& cd prometheus/ \
&& make build
FROM ubuntu:bionic as final
COPY --from=base-builder prometheus/prometheus prometheus
# RUN ./prometheus --config.file=your_config.yml
What we need to do is to create a tiny final
stage that contains only the Prometheus executable. We can do it with COPY --from
the previous stage.
It's time to build the Docker image.
$ time docker build --no-cache -t prometheus-multistage . -f Dockerfile.prometheus-multistage
...
Successfully built ab2217626102
Successfully tagged prometheus-multistage:latest
real 19m19,570s
user 0m0,418s
sys 0m0,459s
The image size is.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
prometheus-multistage latest ab2217626102 31 seconds ago 151MB
Multistage build performance
Looking at the new results we spent 19 minutes to build the image but the improvement on the size is a significant 99.94% reduction!
Parallel multistage build
So we were able to reduce the image size but the build time is still too much. We can still optimize that by exploiting the Buildkit build engine.
The legacy Docker build engine executes the build of the stages sequentially, on the other hand, Buildkit computes the dependency graph of the stages and parallelize the builds. With this in mind, we can refactor the Dockerfile to speed up the build time.
Refactor Dockerfile to use parallel multistage build
Let's see how this can be done.
FROM ubuntu:bionic as base-builder
ENV GOPATH=$HOME/go
ENV PATH=$PATH:/usr/local/go/bin:$GOPATH/bin
RUN apt-get update \
&& apt-get install -y curl git build-essential
FROM base-builder as base-builder-extended
RUN curl -sL https://deb.nodesource.com/setup_14.x | bash - \
&& apt-get install -y nodejs \
&& npm install -g yarn
FROM base-builder as golang
RUN curl -O https://storage.googleapis.com/golang/go1.15.2.linux-amd64.tar.gz \
&& tar -xvf go1.15.2.linux-amd64.tar.gz
FROM base-builder as source-code
RUN git clone https://github.com/prometheus/prometheus.git prometheus/
FROM base-builder-extended as builder
COPY --from=golang go /usr/local
COPY --from=source-code prometheus/ prometheus/
RUN cd prometheus/ && make build
FROM ubuntu:bionic as final
COPY --from=builder prometheus/prometheus prometheus
# RUN ./prometheus --config.file=your_config.yml
We create a first stage called base-builder
that contains the basic tools and will act as a base for the next layers.
Inheriting from base-builder
we define:
-
golang
, that containsgo
; -
source-code
, that we use to fetch Prometheus source code; -
base-builder-extended
that is an enhancement ofbase-builder
that containsnodejs
andyarn
;
The 3 stages don't depend on each other so the build will be parallelized.
At this point we are ready to build the code, we use builder
for that. In this stage, we COPY --from
the previous stages the artifacts we need to run the build. Then again we create a tiny final
stage that contains only the Prometheus executable.
We can run the build now.
$ DOCKER_BUILDKIT=1 docker build --no-cache -t prometheus-parallel-multistage . -f Dockerfile.prometheus-parallel-multistage
[+] Building 734.4s (13/13) FINISHED
=> [internal] load build definition from Dockerfile.prometheus-parallel-multistage 1.1s
=> => transferring dockerfile: 963B 0.1s
=> [internal] load .dockerignore 0.8s
=> => transferring context: 2B 0.1s
=> [internal] load metadata for docker.io/library/ubuntu:bionic 0.0s
=> CACHED [final 1/2] FROM docker.io/library/ubuntu:bionic 0.0s
=> [base-builder 2/2] RUN apt-get update && apt-get install -y curl git build-essential 195.6s
=> [source-code 1/1] RUN git clone https://github.com/prometheus/prometheus.git prometheus/ 77.6s
=> [base-builder-extended 1/1] RUN curl -sL https://deb.nodesource.com/setup_14.x | bash - && apt-get install -y nodejs 102.1s
=> [golang 1/1] RUN curl -O https://storage.googleapis.com/golang/go1.15.2.linux-amd64.tar.gz && tar -xvf go1.15.2.linux 149.8s
=> [builder 1/3] COPY --from=golang go /usr/local 13.6s
=> [builder 2/3] COPY --from=source-code prometheus/ prometheus/ 9.5s
=> [builder 3/3] RUN cd prometheus/ && make build 338.6s
=> [final 2/2] COPY --from=builder prometheus/prometheus prometheus 2.6s
=> exporting to image 1.9s
=> => exporting layers 1.6s
=> => writing image sha256:c0e59c47a790cb2a6b1229a5fec0014aa2b4540fc79c51531185c9466c9d5584 0.1s
=> => naming to docker.io/library/prometheus-parallel-multistage 0.1s
And check the image size.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
prometheus-parallel-multistage latest c0e59c47a790 About a minute ago 151MB
prometheus-multistage latest ab2217626102 9 minutes ago 151MB
prometheus latest 54b5d99ef76a 39 minutes ago 2.38GB
Parallel multistage build performance
Looking at the new results we spent almost 12.5 minutes to build the image, a 30% reduction, keeping the same image size.
Results recap
The table below summarizes the build time and the image size in the three different examples.
Dockerfile | Build time | Image size |
---|---|---|
prometheus-parallel-multistage | 12.5 m | 151MB |
prometheus-multistage | 19 m | 151MB |
prometheus | 20 m | 2.38GB |
As you can see the improvement, both in build time and in image size, is really huge. Using the multistage parallel build approach can be useful in production where a smaller Docker image can make the difference. All you have to do is to keep in mind how Buildkit works, think of what can be parallelized in your Dockerfile and develop it accordingly. You can easily integrate Buildkit in your Docker build/test/tag/push pipeline).
This is it!
I hope this was useful for you, now go and refactor your old Dockerfile!
Reach me on Twitter @gasparevitta and let me know your performance improvements.
You can find the code snippets on Github.
This article was originally published on my blog. Head over there if you like this post and want to read others like it!
Top comments (2)
That's very interesting, thanks for sharing!
Great Article 👍