One of the advantages of using containers is the ability to have images with our environment pre configured and ready to go with our application code. These images is what allows us to run a container on our laptop and then run the same image on a container in the cloud. But building this images is not a trivial task, and we tend add useless tasks or overhead to the images we create. This entry is to share some of the best practices that has work for me in the past when building docker images.
First of all we need to define what a docker image is. A docker image is a unit of packaging that may includes Operating systems constructs or packages, application dependencies and libraries and application code. If you came from the virtualization world, see the image as a VM template, and the containers are instances created using this template. Otherwise, if you are a developer, you can think of images as a class definition, and containers are instances of that class.
Docker images must be stored somewhere, right?, these are the called image registries. These registries are like repositories where you store your images. Some examples are:
- Docker Hub
- AWS ECR
- Oracle Container Registry
- Azure Container Registry
- Google Container Registry
- Red Hat Quay
Note you will need to create an account on these registries if you want to pull or push docker images. You need to pull images to your local computer to use it.
Note you can also create a docker image locally without the need of a registry but if you want to use that image on another computer, server o cloud yo need to use the registry.
Images are made up of multiple layers represented as a single object. One layer that is out of the instance is the kernel, as container use the host kernel.
Example of this:
☁ docker [master] ⚡ docker image pull mongo:latest latest: Pulling from library/mongo 7b1a6ab2e44d: Already exists 90eb44ebc60b: Pull complete 5085b59f2efb: Pull complete c7499923d022: Pull complete 019496b6c44a: Pull complete c0df4f407f69: Pull complete 351daa315b6c: Pull complete 5b6df31e95f8: Pull complete e82745116109: Pull complete 98e820b4cad7: Pull complete Digest: sha256:cf9f5df5419319390cc3b5d9abfc2d0d0b149b3e9e3e29b579 Status: Downloaded newer image for mongo:latest docker.io/library/mongo:latest
Here we can see other interesting characteristics from docker images: when you pull an image, docker only download the layers that changed or the ones that are new in your local system, helping to reduce overhead on network traffic.
- This layer >
7b1a6ab2e44d: Already existswas already on my system
- This one >
90eb44ebc60b: Pull completewas downloaded.
Images can be really small, let see the following example:
☁ docker [master] ⚡ docker pull alpine Using default tag: latest latest: Pulling from library/alpine 59bf1c3509f3: Pull complete Digest: sha256:21a3deaa0d32a8057914f36584b5288d2e0118285c70fa8c9300 Status: Downloaded newer image for alpine:latest docker.io/library/alpine:latest
As you may noticed, there are fewer layers on alpine image. Just bo be sure we are going to execute an ls:
☁ docker [master] ⚡ docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE alpine latest c059bfaa849c 8 days ago 5.59MB mongo latest 4253856b2570 2 weeks ago 701MB
5MB vs 700MB, and the standard is around 40-200MB for images.
Another aspect to consider is the images naming, if we observed previous command images are called from : (This is for official images), being latest the default if you do not specify any version.
There is a way to query if an image is an official one, by using:
☁ docker [master] ⚡ docker search ubuntu --filter "is-official=true" NAME DESCRIPTION STARS OFFICIAL AUTOMATED ubuntu Ubuntu is a Debian-based Linux operating sys… 13244 [OK] websphere-liberty WebSphere Liberty multi-architecture images … 282 [OK] ubuntu-upstart DEPRECATED, as is Upstart (find other proces… 112 [OK] ubuntu-debootstrap DEPRECATED; use "ubuntu" instead 45 [OK]
NOW, after the images 101, we need to move to some of the best practices using or creating images, lets get to it...
I almost forgot to talk about Dockerfile ...
- Text document that contains commands to build an image
- Yeap, it must be named Dockerfile with uppercase at the beginning
- Nop, it cannot be named dockerfile, docker-file, docker file...
This is an small example of a Dockerfile from the docker official documentation:
FROM ubuntu:18.04 COPY . /app RUN make /app CMD python /app/app.py
AND now, we can move on with the recommendations...
From previous example on Dockerfile, the base image is the one that we use on the FROM statement
FROM ubuntu:18.04 COPY . /app RUN make /app CMD python /app/app.py
There is caveat, instead of using a base OS image and then add a RUN statement to install dependencies, use the official image that has the dependencies already installed.
FROM ubuntu:18.04 RUN apt-get update && apt-get install -y gnupg RUN wget -qO - https://www.mongodb.org/static/pgp/server-5.0.asc | sudo apt-key add - COPY ./mongodb-org-5.0.list /etc/apt/sources.list.d/mongodb-org-5.0.list RUN apt-get install -y mongodb-org
It will simplify your Dockerfiles, and make your live easier...
As I mentioned before, a docker images are layered, some of this images can be cached on your local system, YOU WANT to have cached layers, it will take less to create your container when pulling your images.
When using a Dockerfile
FROM ubuntu COPY . /app CMD ["java", "-jar", "/app/target/app.jar"]
When you make a change on your code, which corresponds to the second line
COPY . /app, the rest of the layers will be pulled only the
FROM ubuntu will be cached.
Put the COPY of your code at the end.
Also, as a bonus, just copy the jar file to the container, you do not need all the files. For example, yo do not need the README file
FROM ubuntu RUN apt-get update && apt-get install -y --no-install-recommends \ openjdk-8-jdk ssh vim COPY target/app.jar /app CMD ["java", "-jar", "/app/target/app.jar"]
And if we remember the first recommendation, someone already worked on an image with openjdk installed
FROM openjdk COPY target/app.jar /app CMD ["java", "-jar", "/app/target/app.jar"]
And it is an official one:
☁ docker [master] ⚡ docker search openjdk --filter "is-official=true" NAME DESCRIPTION STARS OFFICIAL AUTOMATED openjdk OpenJDK is an open-source implementation of … 3046 [OK]
If you remember, at the begining we saw that if you do not specify any version, latest will be used...
OK, yo do not want to use latest...
- latest is unpredictable
- latest can change between pulls
- latest can break your code
- latest is not love!!
Use an specific image version
FROM openjdk:slim COPY target/app.jar /app CMD ["java", "-jar", "/app/target/app.jar"]
non-relevant for this entry TIP
You can also pull a repo with all its images:
☁ docker [master] ⚡ docker pull --all-tags alpine #output too long to be shown ☁ docker [master] ⚡ docker image ls | grep alpine alpine 3 c059bfaa849c 9 days ago 5.59MB alpine latest c059bfaa849c 9 days ago 5.59MB ansible-base-lab_managed-host-alpine latest 77f2f125fa50 6 weeks ago 80.8MB alpine 20210804 4e873038b87b 4 months ago 5.59MB alpine 20210730 8fd5af68fdb2 4 months ago 5.59MB alpine 3.10 e7b300aee9f9 7 months ago 5.58MB alpine 20210212 b0da5d0678e7 8 months ago 5.62MB alpine 20201218 430cc6504dbd 11 months ago 5.61MB alpine 20200917 003bcf045729 14 months ago 5.62MB alpine 20200626 3c791e92a856 17 months ago 5.57MB alpine 20200428 5737d7d248e9 19 months ago 5.6MB
This is going to be a short one, a container should have only one concern. A web application may consist of 3 containers (The web app code, the database, the cache) instead of only one doing all.
This help to scale and make atomic changes.
Official OS images like ubuntu may contain some packages or services installed that we do not need.
Remember the idea of a container is to provide just the necessary software for your application to run as expected, you may not even need to enter the container, this is why some images does not have a shell installed.
Smaller flavors also improve security, because there are less services to attack and less services to update.
AND smaller flavors are easy to transfer and store.
Let see an example on the openjdk slim vs the jdk
☁ docker [master] ⚡ docker image ls | grep openjdk openjdk slim 8b0ead3b8172 33 hours ago 407MB openjdk 18-jdk-alpine3.15 c89120dcca4c 3 days ago 329MB
☁ docker [master] ⚡ docker image ls | grep python python latest 47ebea899258 20 hours ago 917MB python 3.7.12-alpine3.15 a1034fd13493 3 days ago 41.8MB ☁ docker [master] ⚡
This is not intended to be an extensive list of best practices, these represents the easiest steps you can start working with, on following entries i will write about more advanced topics like multistage builds and all that beautiful things that we can make.