Nicola Apicella

Posted on Apr 21, 2020 • Edited on Jan 1, 2021

How are docker images built? A look into the Linux overlay file-systems and the OCI specification

#docker #linux #computerscience #webdev

It's impossible to work with docker containers without docker images. In this post I want to talk about what makes docker images possible: the overlay filesystems.
I'll start with a brief description of overlay filesystems. Then we will see how it applies to docker images and how docker builds an image from a dockerfile. I'll conclude with layers cache and OCI format for container images.

As usual I'll try to make the blog post as practical as possible.

What's an overlay filesystems

Overlay filesystems (also called union filesystems) allow creating a union of two or more directories: a list of lower directories and an upper directory. The lower directories of the filesystem are read only, whereas the upper directory can be used for both reads and writes.

Let's see what that means in practice by mounting one.

Create an overlay fs

Let's create a few folders and combine them.
First I'll create a folder called "mount" which will contain the union of all the other folders. Then I'll create a bunch of folders called "layer-1", "layer-2", "layer-3", "layer-4". Finally a folder called "workdir" which is needed by the overlay filesystem, well, to work properly.

You can call any of the folders as you wish, but calling them "layer-1", "layer-2", etc. will make easier to understand the parallel with docker images as we shall see.

cd /tmp && mkdir overlay-example && cd overlay-example

[2020-04-19 16:02:35] [ubuntu] [/tmp/overlay-example]  
> mkdir mount layer-1 layer-2 layer-3 layer-4 workdir

[2020-04-19 16:02:38] [ubuntu] [/tmp/overlay-example]  
> ls
layer-1  layer-2  layer-3  layer-4 mount  workdir

Let's also create some files into layer-1, layer-2 and layer-3 folders.
We will leave the layer-4 (our upper folder) empty. Again, that's not necessary, it will just make easier our parallel with docker images.

[2020-04-19 16:02:40] [ubuntu] [/tmp/overlay-example]  
> echo "Layer-1 file" > ./layer-1/some-file-in-layer-1

[2020-04-19 16:03:36] [ubuntu] [/tmp/overlay-example]  
> echo "Layer-2 file" > ./layer-2/some-file-in-layer-2

[2020-04-19 16:03:53] [ubuntu] [/tmp/overlay-example]  
> echo "Layer-3 file" > ./layer-3/some-file-in-layer-3

Finally, let's mount the filesystem:

sudo mount -t overlay overlay-example \
-o lowerdir=/tmp/overlay-example/layer-1:/tmp/overlay-example/layer-2:/tmp/overlay-example/layer-3,upperdir=/tmp/overlay-example/layer-4,workdir=/tmp/overlay-example/workdir \
/tmp/overlay-example/mount

Now let's look inside the mount folder:

[2020-04-19 16:13:28] [ubuntu] [/tmp/overlay-example]  
> cd mount/

[2020-04-19 16:13:31] [ubuntu] [/tmp/overlay-example/mount]  
> ls -la
total 20
drwxr-xr-x 1 napicell domain^users 4096 Apr 19 16:07 .
drwxr-xr-x 8 napicell domain^users 4096 Apr 19 16:07 ..
-rw-r--r-- 1 napicell domain^users   13 Apr 19 16:03 some-file-in-layer-1
-rw-r--r-- 1 napicell domain^users   13 Apr 19 16:03 some-file-in-layer-2
-rw-r--r-- 1 napicell domain^users   13 Apr 19 16:03 some-file-in-layer-3

As expected the content of the folders layer-1, layer-2 and layer-3 have been mounted/combined in the mount folder.
Sure enough if we look at the content of the files, we'll find what we have written in the previous step.

[2020-04-19 16:13:33] [ubuntu] [/tmp/overlay-example/mount]  
> cat some-file-in-layer-3
Layer-3 file

Let' try to create a file in the mount folder:

[2020-04-19 16:23:31] [ubuntu] [/tmp/overlay-example/mount]  
 > echo "new content" > new-file

[2020-04-19 16:27:33] [ubuntu] [/tmp/overlay-example/mount]  
> ls
new-file  some-file-in-layer-1  some-file-in-layer-2  some-file-in-layer-3

Where should the new file be? In the upper layer, which in our case is the folder called "layer-4":

 [2020-04-19 16:23:49] [ubuntu] [/tmp/overlay-example]  
> tree
.
├── layer-1
│   └── some-file-in-layer-1
├── layer-2
│   └── some-file-in-layer-2
├── layer-3
│   └── some-file-in-layer-3
├── layer-4
│   └── new-file
├── mount
│   ├── new-file
│   ├── some-file-in-layer-1
│   ├── some-file-in-layer-2
│   └── some-file-in-layer-3
└── workdir
    └── work [error opening dir]

7 directories, 8 files

Let's try to delete a file:

[2020-04-19 16:27:33] [ubuntu] [/tmp/overlay-example/mount]  
> rm some-file-in-layer-2

[2020-04-19 16:28:58] [ubuntu] [/tmp/overlay-example/mount]  
> ls
new-file  some-file-in-layer-1  some-file-in-layer-3

What do you think happened to the original file in the "layer-2" folder?

 [2020-04-19 16:29:57] [ubuntu] [/tmp/overlay-example]  
> tree
.
├── layer-1
│   └── some-file-in-layer-1
├── layer-2
│   └── some-file-in-layer-2
├── layer-3
│   └── some-file-in-layer-3
├── layer-4
│   ├── new-file
│   └── some-file-in-layer-2
├── mount
│   ├── new-file
│   ├── some-file-in-layer-1
│   └── some-file-in-layer-3
└── workdir
    └── work [error opening dir]

7 directories, 8 files

A new file called "some-file-in-layer-2" was created in "the layer-4". The weird thing is that the file is a character file. These kinds of files are called "whiteout" files and are how the overlay filesystem represents a file being deleted:

 [2020-04-19 16:31:09] [ubuntu] [/tmp/overlay-example/layer-4]  
> ls -la
total 12
drwxr-xr-x 2 napicell domain^users 4096 Apr 19 16:28 .
drwxr-xr-x 8 napicell domain^users 4096 Apr 19 16:07 ..
-rw-r--r-- 1 napicell domain^users   12 Apr 19 16:23 new-file
c--------- 1 root     root         0, 0 Apr 19 16:28 some-file-in-layer-2

Now that we have finished with it, let's unmount the filesystem and remove the folder we created:

[2020-04-19 16:37:11] [ubuntu] [/tmp/overlay-example]  
> sudo umount /tmp/overlay-example/mount && rm -rf /tmp/overlay-example

Wrapping up the overlay filesystems

As we said at the beginning, the overlay filesystem allows to create a union of directories. In our case the union was created in the "mount" folder and it was the result of combining the "layer-{1, 2, 3, 4}" folders. Changes to files, deletion or creation will be stored in the upper dir, which in our case is "layer-4". This is why this layer is also called "diff" layer.
Files from upper layer shadow the ones in lower layers, i.e. if you have a file with the same name and relative path in layer-1 and layer-2, the layer-2 file is going to end up in the "mount" folder.

In the next section we will see how this is used with docker images.

What's a docker Image?

A docker image is essentially a tar file with a root file system and some metadata. You might have heard of the expression image layer and that every line in a docker file creates a new layer. For example in the following snippet we will end up with an image with three layers.

FROM scratch
ADD my-files /doc
ADD hello /
CMD ["/hello"]

So what happens when you type "docker run". A lot of things really, but for the purpose of this article we are only interested in the bits concerning the image.
At high level, docker downloads the tarballs for the image, it unpacks each layer into a separate directory and then tells the overlay filesystem to combine them all together together with an empty upper directory that the container will write its changes to it.
When you change, create or delete files in the container, the changes are going to be stored in this empty directory. When the container exits, docker cleans up the folder - that is why the changes you make in the container do not persist.

Layers cache

This way to use the overlay filesystem allows hosts to cache docker images effectively. For example, if you define two images, they can both use the same layers. No need to download multiple times or to have many copies on the disk!

OCI-format container images

Running a container at high level can be seen as a two steps process: building the image and running a container from the image. The popularity of docker has convinced people to standardize both steps - allowing the two pieces to evolve separately. The Open Container Initiative (OCI) is the governance which has been working with the industry to these standards.

The OCI currently contains two specifications: the Runtime Specification (runtime-spec) and the Image Specification (image-spec). The Runtime Specification outlines how to run a “filesystem bundle” that is unpacked on disk. At a high-level an OCI implementation would download an OCI Image then unpack that image into an OCI Runtime filesystem bundle. At this point the OCI Runtime Bundle would be run by an OCI Runtime.

The standardization allows other people to develop custom container builders and runtimes. For example, jessfraz/img, buildah and Skopeo are all tools that allow you to build container images without using docker. Similarly, many tools to run containers (so called container runtimes) have emerged, for example runc (used by docker) and rkt.

Other overlay filesystems

Overlay is not the only union file system that docker can use. Any file system that allows union like features and diff layer could potentially be used. For example docker can use overlay as we have seen, but also aufs, btrfs, zfs and devicemapper.

What happens when you build an image?

Let's assume we have the following dockerfile we want to use to build an image from:

FROM ubuntu
RUN apt-get update
...

At high level, this is how docker builds an image out of it:

Docker downloads the tarball for the image specified in the "FROM" and unpacks it. This is the first layer of the image.
Mounts a union file system, with the lower dir being the one just downloaded. The upper dir is an empty folder
Starts bash in a chroot and runs the command specified in RUN: chroot . /bin/bash -c "apt get update"
When the command is over, it zips the upper layer. This is the new layer of the image we are building
If the dockerfile contains other commands, repeat the process from the second step using as lower dir all the layers we have got so far. Otherwise exit.

Of course this is a simplified workflow which does not take into account different type of commands like "ENV", "ENTRYPOINT", etc. Those things are stored in the metadatafile which is going to be bundled together with the layers.

Conclusion

The idea of zipping a whole root file system in a tar and keeping a tar for each diff-layer turned out to be very powerful. It did not just enabled docker, but turns out to be a concept that can be used in other context as well. I guess, we will see more tools taking advantage of that in the future.

Follow me on Twitter to get new posts in your feed.
Credit for the cover image to unsplash-logofrank mckenna.

Top comments (14)

Thai Pangsakulyanont • Apr 21 '20

Today I learned about OCI stuff (that is totally new to me) and what’s going on behinds the scene when I run docker build. At first I thought it was a complex, custom-made, Docker-specific stuff. The way you broke it down two ordinary Linux commands made it much clearer for me what’s going on… it doesn’t look as scary as I thought now.

Thanks for sharing!

Nicola Apicella • Apr 21 '20

Thank you! Glad it helped :)

Nikolay Kolchenko • Jul 8 '20 • Edited

1) *CLI examples aren't consistent. *

[2020-04-19 16:31:09] [ubuntu] [/tmp/overlay-example/layer-4]  
 pactvm > ls -la

and then:

[2020-04-19 16:37:11] [ubuntu] [/tmp/overlay-example]  
> sudo umount /tmp/overlay-example/mount && rm -rf *

Imaging that noob reads it and the very first question is "What's the pactvm??"
Another important point is that the simple CTRL+C, CTRL+V of commands isn't working.

the

cd /tmp/overlay-example

is missed. I can almost see how noob does just a

cd

and

rm -rf *

does the trick. Why not to put

rm -rf /tmp/overlay-example/*

?

To sum it up, the article explains a really nice concept.. but in a very inconsistent and dangerous way.
Thank you. :)

Pablo Oliva • Jan 1 '21

I am not a newbie, or at least i do not consider myself one, and I was confused by pactvm... thinking it was some command that I had never seen before, but I was too lazy to do a search.

Nicola Apicella • Jan 1 '21 • Edited

It s my login name. I removed it from most of the commands but forgot to remove it from some. I m so used to see my terminal that way I do not even notice it anymore 😅

I removed it. Thanks for the feedback

Matt Nguyen • Apr 21 '20

I think: sudo umount /tmp/overlay-example/mount && rm -rf .* --> ...&& rm -rf *

Nicola Apicella • Apr 21 '20

Good catch, it was a typo. Thanks

Matt Nguyen • Apr 21 '20 • Edited

No big deal, at least it throws an error. Not sure the feelings if the typo were rm -rf /* :))

Nicola Apicella • Apr 21 '20

Even if it was "/*" it would have thrown an error, unless you were running as root. Note that only the mount is sudo-ed and that privileges are not propagated to the other commands in the && condition. Try:

> sudo whoami && whoami
root
napicella

That being said, the boy scout rule applies: never run as root and never copy/paste random commands (especially the ones which require privilege escalation) without knowing what they do :)

Donald Gillies • May 1 '20 • Edited

One of the very early union file systems is Clearcase by Rational Software (now owned by IBM). A workspace is put together via a series of views (which are versioned software releases) overlaid upon one another. The software required modifying the Solaris Kernel and required very powerful CPUs to work well. You would make all your changes in the top layer and when it was time to commit you only had to specify what branch(es) to add those changes to - clearcase knew what had been changed.
en.wikipedia.org/wiki/Rational_Cle... .