You might know that a container is a standardized unit of software, that is, an application with everything that it requires to work: runtime, libraries, config, etc.
If this is not enough for you and you'd like to actually see with your own eyes what a container is, read on.
A container can be in two states: running or stopped.
In stopped state a container looks absolutely simple: it just one file called
config.json that contains configuration and a directory called
rootfs that is used as a container root (
Can it be that simple?
You might know that a container image is basically just an archived directory that becomes the root directory when a container runs. We can unpack it with
tar to a directory called
mkdir rootfs && docker export $(docker create alpine) | tar -xf - -C rootfs
You can run a container with
runc command after creating a default config file (
runc spec && sudo runc run mycontainerid
To see that you're indeed in a container, run:
and you should see
Moreover, if you start another terminal and create a file in
you'll see this file in the container:
ls / # bin hello-from-host media
But what is
runc and what does it do?
runc runs a container (that's basically what docker uses under the hood) in an isolated environment. Kind of.
What you might think is that
runc puts an app process in a software equivalent of a solid metal box.
But it's far from truth. A container process is not enclosed in some kind of jail from which it cannot escape. The process has no idea that it's restricted. It has no idea how real world looks like.
runc virtually lies to him that:
rootfsdirectory is the root of the filesystem
- there are no other processes in the system and it's init process with
- about available network, computation and memory resources
It's more like putting a vr headset into a process without telling it that it's in virtual reality. It can't escape
rootfs directory because in his world view
/ and there's no way to go higher than
This is possible mostly thanks to two kernel features:
namespaces allow to virtualize system resources and
cgroups provides a way to limit resources like CPU and memory.
As you can see containers do not contain. The isolation is only one-way. A contained process can't see the world outside of a container but the host has all the information about the process, can access its filesystem and interact with it as if it was a normal process (it practically is a normal process). Isolation is based on providing fake information about the state of the system to a contained process.
That makes containers really lightweight but a little less secure than VMs. Moving a normal app to a container is a breeze because from the app's perspective there's no difference between normal environment and virtual environment presented to it when it's run in a container so there's no need for special modifications.
Summing up, a container is an app code and all its dependencies kept in a single directory tree plus a config file. When run, a container is restricted to that directory by making it think that the directory is a filesystem root and providing it with manufactured information about system resources. To make the container easy to transport it's packed into an image format that's basically an archive of a container directory.