loading...

How Docker Container Networking Works - Mimic It Using Linux Network Namespaces

polarbit profile image Safak Ulusoy Updated on ・11 min read

Docker (and probably any container technology) uses linux network namespaces to isolate container network from host network. When Docker creates and runs a container; it creates a separate network namespace (container network) and puts the container into it.

Then, Docker connects the new container network to linux bridge docker0 using a veth pair. This also enables container be connected to the host network and other container networks in the same bridge.

So let's try to define network namespace, veth pair and linux bridge in one sentence:

A "linux network namespace" is virtual network barrier encapsulating a process to isolate its network connectivity(in/out) and resources (i.e. network interfaces, route tables and rules) from linux core and other processes.

A "veth pair" is basically a virtual network cable which have a virtual network interface device (NIC) on each end.

A "linux bridge" is switch like virtual device that enables communication between network devices connected to the bridge, creating something kinda LAN.

The diagram below may help you visualize and understand container networking better. Please refer to this diagram frequently while reading the rest of the post.
Docker Container Networking

The plan is to first investigate a docker container's network structure.
After that we will try to mimic a docker container network by manually creating a network namespace and veth pair; then do required configurations.

1- Let's Demystify Docker Container Networking

Now we investigate docker's container networking.

I have three ubuntu containers running locally, "con1", "con2", and "con3" namely. First let's have a look at them. See that con3 is not in default docker0 bridge. I run it in a custom docker bridge network named 'testnet'.

$ filter='Name={{.Name}} Hostname={{.Config.Hostname}} ' &&
> filter+='IP={{or .NetworkSettings.IPAddress .NetworkSettings.Networks.testnet.IPAddress}} ' &&
> filter+='Mac={{or .NetworkSettings.MacAddress .NetworkSettings.Networks.testnet.MacAddress}} ' &&
> filter+='Bridge={{if .NetworkSettings.IPAddress}} docker0 {{else}} testnet {{end}}' &&
> docker inspect con1 con2 con3 --format "$filter" | sed 's/=\//=/g'
Name=con1 Hostname=f267c8cc5b55 IP=172.18.0.2 Mac=02:42:ac:12:00:02 Bridge= docker0
Name=con2 Hostname=fe0bb3dcedd8 IP=172.18.0.3 Mac=02:42:ac:12:00:03 Bridge= docker0
Name=con3 Hostname=bd3c200eca8b IP=172.19.0.2 Mac=02:42:ac:13:00:02 Bridge= testnet

Now we will have a look at existing containers' network namespaces. However, Docker does not add container namespaces as visible from host by default. So we first need to make docker container network namespaces be visible from host so that we can see them using ip netns list command. Since this part is not too critical to understand I will dump all commands here.

# Try to list docker network namespaces. But the result will be empty.
$ sudo ip netns list
<no result>

# Make docker network namespaces visible.
$ sudo mkdir -p /var/run/netns
$ pid1="$(docker inspect con1 -f '{{.State.Pid}}')"
$ pid2="$(docker inspect con2 -f '{{.State.Pid}}')"
$ pid3="$(docker inspect con3 -f '{{.State.Pid}}')"
$ sudo ln -sf /proc/$pid1/ns/net /var/run/netns/con1
$ sudo ln -sf /proc/$pid2/ns/net /var/run/netns/con2
$ sudo ln -sf /proc/$pid3/ns/net /var/run/netns/con3

# Now we can see the container network namespaces.
$ sudo ip netns list
con3 (id: 3)
con2 (id: 2)
con1 (id: 1)

Now as you see, we have three container namespaces, namely con1, con2 and con3. They have same names with their corresponding container.

Let's see the details of each containers' network interfaces (NIC) on both side of their veth pairs. We will interpret the result afterwards.

Remember, a veth pair is kinda a network cable which has two NICs on each end which connects a container network namespace to other devices which are also connected to same linux bridges.


# See 'con1' virtual network interface and ipv4 address.
$ sudo ip netns exec con1 ip a | grep -e 'inet.*eth0' -e 'eth0@'
22: eth0@if23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    inet 172.18.0.2/16 brd 172.18.255.255 scope global eth0

# See 'con2' virtual network interface and ipv4 address.
$ sudo ip netns exec con2 ip a | grep -e 'inet.*eth0' -e 'eth0@'
24: eth0@if25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    inet 172.18.0.3/16 brd 172.18.255.255 scope global eth0

# See 'con3' virtual network interface and ipv4 address.
$ sudo ip netns exec con3 ip a | grep -e 'inet.*eth0' -e 'eth0@'
34: eth0@if35: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    inet 172.19.0.2/16 brd 172.19.255.255 scope global eth0

# See docker bridges' and containers' NICs on the host side.
...
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:7c:72:8a:f2 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 brd 172.18.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:7cff:fe72:8af2/64 scope link
       valid_lft forever preferred_lft forever
23: vethd1d3c7f@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP
 group default
    link/ether d6:92:c7:41:c6:b9 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::d492:c7ff:fe41:c6b9/64 scope link
       valid_lft forever preferred_lft forever
25: veth622799a@if24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP
 group default
    link/ether fa:47:38:e1:eb:04 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::f847:38ff:fee1:eb04/64 scope link
       valid_lft forever preferred_lft forever
30: br-6c1c92f34df9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:a3:0b:d6:71 brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.1/16 brd 172.19.255.255 scope global br-6c1c92f34df9
       valid_lft forever preferred_lft forever
    inet6 fe80::42:a3ff:fe0b:d671/64 scope link
       valid_lft forever preferred_lft forever
35: veth05790ff@if34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-6c1c92f34df9
state UP group default
    link/ether de:d4:74:dd:ea:5c brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::dcd4:74ff:fedd:ea5c/64 scope link
       valid_lft forever preferred_lft forever

# Ping 'con2' from 'con1' network. (SUCCESS)
ubuntu@vm0:~$ sudo ip netns exec con1 ping 172.18.0.3
PING 172.18.0.3 (172.18.0.3) 56(84) bytes of data.
64 bytes from 172.18.0.3: icmp_seq=1 ttl=64 time=0.040 ms
64 bytes from 172.18.0.3: icmp_seq=2 ttl=64 time=0.037 ms
--- 172.18.0.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1023ms

# Ping 'con3' from 'con1' network. (FAIL)
ubuntu@vm0:~$ sudo ip netns exec con1 ping 172.19.0.2
PING 172.19.0.2 (172.19.0.2) 56(84) bytes of data.
--- 172.19.0.2 ping statistics ---
73 packets transmitted, 0 received, 100% packet loss, time 73728ms
...

Let's interpret the results and sum up.

Container veth NIC on Container veth NIC on Bridge Bridge
con1 22: eth0@if23 172.18.0.2 23: vethd1d3c7f@if22 docker0
con2 24: eth0@if25 172.18.0.3 25: veth622799a@if24 docker0
con3 34: eth0@if35 172.19.0.2 35: veth05790ff@if34 br-6c1c92f34df9
  • One end of veth pair (eth0@if23) is in container network; and the other (vethd1d3c7f@if22) is in host network (or docker0 bridge).
  • See that NIC names end with other peer NIC's line number. This may help you identify a veth pair when you see them.
  • When you run ip addr list command; you only see NICs in host network.
  • In order to execute some command inside container's network you don't need to login to the container. Just execute your command like this: ip netns exec <name of namespace> <shell command>
  • NICs on bridges do not have ip addresses.
  • Ping is successful from con1 to con2 since they are both connected to default docker0 bridge.
  • Ping is failed from con1 to con3 since they are connected to different bridges and they are not connected. That is basically how containers or container groups are isolated from each other.

Also just try some commands in container networks and play around using the command we mentioned above "ip netns exec ...". You can run ip a, traceroute 8.8.8.8, ping 172.18.0.3 or whatever you want.

That's all I want to say for now. I believe that is enough to understand very basics of docker container networking. Let's continue with the next part.

2- How to Manually Create A Container Network - Using Linux Network Namespaces and Veth Pair

Now we will create a network namespace and a veth pair; then connect "host network" to "container network" using this veth pair.

Let's start with creating a new empty network namespace:

# Create network namespace.
ubuntu@vm0:~$ ip netns add sample

# List namespaces.
ubuntu@vm0:~$ ip netns list
sample

Then we create a veth pair in default network namespace:

# Create new veth pair.
ubuntu@vm0:~$ ip link add veth1 type veth peer name veth2

# List network interfaces. 
ubuntu@vm0:~$ ip a
20: veth2@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether de:f9:8e:4c:75:60 brd ff:ff:ff:ff:ff:ff
21: veth1@veth2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether de:6f:db:d8:58:cd brd ff:ff:ff:ff:ff:ff

At this point we have a container network (network namespace) named "sample" and a veth pair which have network interfaces named "veth1" and "veth2" on each end. And now we can connect host network to sample network, by moving one end of the vnet pair veth1 to the sample network. Let's do it:

# Move veth1 to 'sample' namespace.
ubuntu@vm0:~$ ip link set veth1 netns sample

# List interfaces in 'default' namespace.
ubuntu@vm0:~$ ip a
20: veth2@if21: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000

Now we connected two host and container network namespaces. But we are not there yet. We have still many things to make it work.

If you look at the last screen, you can see that we can no longer see veth1 in default network namespace. Because we moved it to the sample network namespace, but ip netns list command is executed in default network namespace.

In order to see veth1 again, we need to run ip link list command in the sample network namespace. For this we will use ip netns exec <command> command, which executes the given command in the specified network namespace. Let's do it:

# List interface in 'sample' namespace.
ubuntu@vm0:~$ ip netns exec sample ip link list
21: veth1@if20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000

There is one thing to note. Previously, veth2 was listed as 20: veth2@veth1 .. in response to ip a command. Now its name is listed as 20: veth2@if21 ... Similary veth1 is also listed as 20: veth1@21 ... from now on. It seems veth pair's network interfaces started to use other veth end's line number in its names. Knowing this may help identify veth pairs in different network namespaces.

Configure Networking Between Host and Container

At this point we have network namespace named "sample" and this network namespaces is connected to the "default" network namespace by a veth pair.
(container network) veth1 <--> veth2 (host network)

Sometimes I may refer sample network namespace as container network; and default network namespace as host network.

But still we are not there yet. We still need to configure many things to be able to access to the container network from host network by IPv4 address; or access to internet from container network. Let's do the following steps quickly:

  • Create a linux bridge samplebr. (like docker0 bride)
  • Join veth2 network interface to samplebr bridge.
  • Assign an ipv4 address to samplebr.
  • Assign an ipv4 address to veth1.
  • Up veth1 device.
  • Up container localhost.
  • Up veth2 device.
  • Up samplebr device.
  • Add bridge ip as default gateway for the container network.
# Create a linux bridge *samplebr*.
ubuntu@vm0:~$ ip link add name samplebr type bridge

# Join *veth2* network interface to *samplebr* bridge.
ubuntu@vm0:~$ ip link set veth2 master samplebr

# * Assign an ipv4 address to *samplebr*.
ubuntu@vm0:~$ ip addr add 10.1.1.1/24 brd + dev samplebr

# Assign an ipv4 address to *veth1*.
ubuntu@vm0:~$ ip netns exec sample ip addr add 10.1.1.2/24 dev veth1

# Up *veth1* device.
ubuntu@vm0:~$ ip netns exec sample ip link set veth1 up

# Up container localhost.
ubuntu@vm0:~$ ip netns exec sample ip link set lo up

# Up *veth2* device.
ubuntu@vm0:~$ ip link set veth2 up

# Up *samplebr* device.
ubuntu@vm0:~$ ip link set samplebr up

# Add bridge 'samplebr' as default gateway for the container network.
ubuntu@vm0:~$ ip netns exec sample ip route add default via 10.1.1.1


That's it! Now, we did setup networking between host and container network. We can test by pinging our veth1 network interface or container network with ipv4 10.1.1.2 from the host. Or we can ping host network from the container network.

ubuntu@vm0:~$ ping 10.1.1.2
PING 10.1.1.1 (10.1.1.2) 56(84) bytes of data.
64 bytes from 10.1.1.2: icmp_seq=1 ttl=64 time=0.022 ms
64 bytes from 10.1.1.2: icmp_seq=2 ttl=64 time=0.028 ms

ubuntu@vm0:~$ ip netns exec sample ping 172.17.52.174
PING 172.17.52.174 (172.17.52.174) 56(84) bytes of data.
64 bytes from 172.17.52.174: icmp_seq=1 ttl=64 time=0.024 ms
64 bytes from 172.17.52.174: icmp_seq=2 ttl=64 time=0.143 ms

However, we can not reach external networks (internet) yet. We have a little more things to configure.

ubuntu@vm0:~$ sudo ip netns exec sample ping 8.8.8.8
connect: Network is unreachable

ubuntu@vm0:~$ ip netns exec sample ping github.com
ping: github.com: Temporary failure in name resolution
Configure Networking From Container To Internet

We don't yet have internet connection from container network to internet. DNS resolver also is not working. Let's fix these all.

# Make sure *ip_forwarding* is enabled.
root@test1:~# sysctl -w net.ipv4.ip_forward=1

# Enable sending requests and getting responses to/from internet (ping 8.8.8.8).
root@test1:~# iptables -t nat -A POSTROUTING -s 10.1.1.0/24 ! -o samplebr -j MASQUERADE

# Find your dns nameserver for the host network interface (eth0)
root@test1:~#  systemd-resolve --status

# Create resolv.conf file for container network.
root@test1:~# mkdir -p /etc/netns/sample/
root@test1:~# echo "nameserver 172.17.52.161" > /etc/netns/sample/resolv.conf

# Test dns again.
root@test1:~# ip netns exec sample ping github.com
PING github.com (140.82.118.3) 56(84) bytes of data.
64 bytes from lb-140-82-118-3-ams.github.com (140.82.118.3): icmp_seq=1 ttl=49 time=28.5 ms
64 bytes from lb-140-82-118-3-ams.github.com (140.82.118.3): icmp_seq=2 ttl=49 time=27.4 ms

Voila! We made it. Now we have a container network which is able to connect to the host network, internet or other container networks (network namespaces) which is attached to the same linux bridge (if any).

If still you can not connect to the internet from container network; it is possible that some docker rules in iptables are blocking. Run this command # iptables -L. If you see "Chain FORWARD (policy DROP)" somewhere in the result; execute these commands: # iptables -F FORWARD # iptables -P FORWARD ACCEPT.

FINAL WORDS

And we are done. I think I explained most of the things I learnt in last weeks. Of course for ease of understanding or simplicity I omitted or did not go much into some parts. But I think we got to the point.

Please see also the resources I shared below, go over them if you find time. They were beneficial to me.

Why All These Information Is Important
  • It makes you happy and satisfies your curiosity. :)
  • Personally for me, it makes it very easy to understand and reason about when someone talks or writes about container networking or kubernetes networking generally.
  • You are better armed if you face a network issue inside a container or across containers. No You can use various tools inside container namespaces to debug and resolve issues. ('ip a', 'ip netns exec ', 'ping', 'traceroute', 'nslookup', 'tcpdump' ...)

I also want to write two follow-up posts. One will be Basic Networking for Cloud: IP, Subnets, and CIDR; and obviously next will be about Kubernetes Networking 101. I will also try to keep it shorter this time.

I hope you enjoy.


Resources

Posted on Apr 15 by:

polarbit profile

Safak Ulusoy

@polarbit

I love learning, coding and software craftsmanship. Nowadays I mostly get into and study kubernetes, docker, azure, devops, linux, serverless things.

Discussion

markdown guide