Ernesto Lopez

Posted on Dec 3, 2021

You may want to consider this when using docker images and Dockerfiles

#docker #devops #opensource #bestpractices

One of the advantages of using containers is the ability to have images with our environment pre configured and ready to go with our application code. These images is what allows us to run a container on our laptop and then run the same image on a container in the cloud. But building this images is not a trivial task, and we tend add useless tasks or overhead to the images we create. This entry is to share some of the best practices that has work for me in the past when building docker images.

First of all we need to define what a docker image is. A docker image is a unit of packaging that may includes Operating systems constructs or packages, application dependencies and libraries and application code. If you came from the virtualization world, see the image as a VM template, and the containers are instances created using this template. Otherwise, if you are a developer, you can think of images as a class definition, and containers are instances of that class.

Docker images must be stored somewhere, right?, these are the called image registries. These registries are like repositories where you store your images. Some examples are:

Note you will need to create an account on these registries if you want to pull or push docker images. You need to pull images to your local computer to use it.

Note you can also create a docker image locally without the need of a registry but if you want to use that image on another computer, server o cloud yo need to use the registry.

Images are made up of multiple layers represented as a single object. One layer that is out of the instance is the kernel, as container use the host kernel.

Example of this:

☁  docker [master] ⚡  docker image pull mongo:latest  
latest: Pulling from library/mongo
7b1a6ab2e44d: Already exists 
90eb44ebc60b: Pull complete 
5085b59f2efb: Pull complete 
c7499923d022: Pull complete 
019496b6c44a: Pull complete 
c0df4f407f69: Pull complete 
351daa315b6c: Pull complete 
5b6df31e95f8: Pull complete 
e82745116109: Pull complete 
98e820b4cad7: Pull complete 
Digest: sha256:cf9f5df5419319390cc3b5d9abfc2d0d0b149b3e9e3e29b579
Status: Downloaded newer image for mongo:latest
docker.io/library/mongo:latest

Here we can see other interesting characteristics from docker images: when you pull an image, docker only download the layers that changed or the ones that are new in your local system, helping to reduce overhead on network traffic.

This layer > 7b1a6ab2e44d: Already exists was already on my system
This one > 90eb44ebc60b: Pull complete was downloaded.

Images can be really small, let see the following example:

☁  docker [master] ⚡  docker pull alpine
Using default tag: latest
latest: Pulling from library/alpine
59bf1c3509f3: Pull complete 
Digest: sha256:21a3deaa0d32a8057914f36584b5288d2e0118285c70fa8c9300
Status: Downloaded newer image for alpine:latest
docker.io/library/alpine:latest

As you may noticed, there are fewer layers on alpine image. Just bo be sure we are going to execute an ls:

☁  docker [master] ⚡  docker image ls
REPOSITORY                             TAG         IMAGE ID       CREATED        SIZE
alpine                                 latest      c059bfaa849c   8 days ago     5.59MB
mongo                                  latest      4253856b2570   2 weeks ago    701MB

5MB vs 700MB, and the standard is around 40-200MB for images.

Another aspect to consider is the images naming, if we observed previous command images are called from : (This is for official images), being latest the default if you do not specify any version.

There is a way to query if an image is an official one, by using:

☁  docker [master] ⚡  docker search ubuntu --filter "is-official=true" 
NAME                 DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
ubuntu               Ubuntu is a Debian-based Linux operating sys…   13244     [OK]       
websphere-liberty    WebSphere Liberty multi-architecture images …   282       [OK]       
ubuntu-upstart       DEPRECATED, as is Upstart (find other proces…   112       [OK]       
ubuntu-debootstrap   DEPRECATED; use "ubuntu" instead                45        [OK]

NOW, after the images 101, we need to move to some of the best practices using or creating images, lets get to it...

OH WAIT!!

I almost forgot to talk about Dockerfile ...
Basically:

Text document that contains commands to build an image
Yeap, it must be named Dockerfile with uppercase at the beginning
Nop, it cannot be named dockerfile, docker-file, docker file...

This is an small example of a Dockerfile from the docker official documentation:

FROM ubuntu:18.04
COPY . /app
RUN make /app
CMD python /app/app.py

AND now, we can move on with the recommendations...

1.- Use Official Docker images as base Image

From previous example on Dockerfile, the base image is the one that we use on the FROM statement

FROM ubuntu:18.04
COPY . /app
RUN make /app
CMD python /app/app.py

BUT, BUT, BUT

There is caveat, instead of using a base OS image and then add a RUN statement to install dependencies, use the official image that has the dependencies already installed.

Example:

this is ok, but it is not the best approach:

FROM ubuntu:18.04

RUN apt-get update && apt-get install -y gnupg

RUN wget -qO - https://www.mongodb.org/static/pgp/server-5.0.asc | sudo apt-key add -

COPY ./mongodb-org-5.0.list /etc/apt/sources.list.d/mongodb-org-5.0.list

RUN apt-get install -y mongodb-org

This is a better approach

FROM mongo:4.4.11-rc0

It will simplify your Dockerfiles, and make your live easier...

2.- Avoid adding your code at the beginning

As I mentioned before, a docker images are layered, some of this images can be cached on your local system, YOU WANT to have cached layers, it will take less to create your container when pulling your images.

When using a Dockerfile

FROM ubuntu
COPY . /app

CMD ["java", "-jar", "/app/target/app.jar"]

When you make a change on your code, which corresponds to the second line COPY . /app, the rest of the layers will be pulled only the FROM ubuntu will be cached.

Put the COPY of your code at the end.
Also, as a bonus, just copy the jar file to the container, you do not need all the files. For example, yo do not need the README file

FROM ubuntu
RUN apt-get update && apt-get install -y --no-install-recommends \
openjdk-8-jdk ssh vim
COPY target/app.jar /app
CMD ["java", "-jar", "/app/target/app.jar"]

And if we remember the first recommendation, someone already worked on an image with openjdk installed

FROM openjdk
COPY target/app.jar /app
CMD ["java", "-jar", "/app/target/app.jar"]

And it is an official one:

☁  docker [master] ⚡  docker search openjdk --filter "is-official=true"
NAME      DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
openjdk   OpenJDK is an open-source implementation of …   3046      [OK]

3.- Add an specific version for your Base Image

If you remember, at the begining we saw that if you do not specify any version, latest will be used...
Right?
OK, yo do not want to use latest...

latest is unpredictable
latest can change between pulls
latest can break your code
latest is not love!!

Use an specific image version

FROM mongo:4.4.11-rc0

and

FROM openjdk:slim
COPY target/app.jar /app
CMD ["java", "-jar", "/app/target/app.jar"]

You can consult the images versions under Tags inside docker hub

non-relevant for this entry TIP
You can also pull a repo with all its images:

☁  docker [master] ⚡  docker pull --all-tags alpine
     #output too long to be shown

☁  docker [master] ⚡  docker image ls | grep alpine
alpine                                 3           c059bfaa849c   9 days ago      5.59MB
alpine                                 latest      c059bfaa849c   9 days ago      5.59MB
ansible-base-lab_managed-host-alpine   latest      77f2f125fa50   6 weeks ago     80.8MB
alpine                                 20210804    4e873038b87b   4 months ago    5.59MB
alpine                                 20210730    8fd5af68fdb2   4 months ago    5.59MB
alpine                                 3.10        e7b300aee9f9   7 months ago    5.58MB
alpine                                 20210212    b0da5d0678e7   8 months ago    5.62MB
alpine                                 20201218    430cc6504dbd   11 months ago   5.61MB
alpine                                 20200917    003bcf045729   14 months ago   5.62MB
alpine                                 20200626    3c791e92a856   17 months ago   5.57MB
alpine                                 20200428    5737d7d248e9   19 months ago   5.6MB

4.- Decouple your applications on different container

This is going to be a short one, a container should have only one concern. A web application may consist of 3 containers (The web app code, the database, the cache) instead of only one doing all.

This help to scale and make atomic changes.

5.- Use leaner official images also referred to minimal flavors

Official OS images like ubuntu may contain some packages or services installed that we do not need.

Remember the idea of a container is to provide just the necessary software for your application to run as expected, you may not even need to enter the container, this is why some images does not have a shell installed.

Smaller flavors also improve security, because there are less services to attack and less services to update.

AND smaller flavors are easy to transfer and store.

Let see an example on the openjdk slim vs the jdk

☁  docker [master] ⚡  docker image ls | grep openjdk
openjdk                                slim                8b0ead3b8172   33 hours ago    407MB
openjdk                                18-jdk-alpine3.15   c89120dcca4c   3 days ago      329MB

Other option:

☁  docker [master] ⚡  docker image ls | grep python
python                                 latest              47ebea899258   20 hours ago    917MB
python                                 3.7.12-alpine3.15   a1034fd13493   3 days ago      41.8MB
☁  docker [master] ⚡

Comments

This is not intended to be an extensive list of best practices, these represents the easiest steps you can start working with, on following entries i will write about more advanced topics like multistage builds and all that beautiful things that we can make.

DEV Community