Container Namespaces – Deep Dive into Container Networking

Of late, there have been various Open source projects to manage networking for containers. Docker implemented “libnetwork“. I’ve written in the past about using Calico with Docker containers. Debugging and low level tweaks to performance isn’t easy without an in-depth understanding of how the network stack works for a container. This post tries to explain network namespaces and its role in container networking using default networking that comes out-of-box with docker.

Docker Container Tutorial

As you probably already know containers use namespaces to isolate resources and rate limit their use. Linux’s network namespaces are used to glue container processes and the host networking stack. Docker spawns a container in the containers own network namespace (use the CLONE_NEWNET flag defined in sched.h when calling the clone system call to create a new network namespace for the subprocess) and later on runs a veth pair (a cable with two ends) between the container namespace and the host network stack. If you are new to network namespaces this blog post by Scott gives a quick overview and serves as a good 101 refresher if you are already familiar with these concepts but haven’t used them for a while.

Container Namespaces - Deep Dive into Container Networking

Now lets see how to access this network namespace for a given container. By default docker does not add container network namespaces to the linux runtime data (/var/run mounted as a tmpfs from /run) which is what you see when you run the ip netns command. This can be done by the following steps:

Step 1: Get container process id. Either run docker inspect and look for the Pid under state section or use the following command to extract the Pid field explicitly.

$ pid = "$(docker inspect -f '{{.State.Pid}}' "container_name | Uuid")"

Step 2: Soft link (symlink) the network namespace of the process from the /proc directory into the /var/run directory as shown below. Note that you may need to create the directory netns under /var/run before symlinking the process network namespace.

$ sudo mkdir -p /var/run/netns
$ sudo ln -sf /proc/$pid/ns/net "/var/run/netns/container_name or uuid"

Step 3: The network namespace can you be listed and accessed using the ip netns and ip netns exec (netns_name) (command)

$ sudo ip netns
$ sudo ip netns exec "container name | uuid" ip a

To explain this with an example, lets start a container. For this example, I will use a busybox image as the container’s base image. Since I’m using docker out-of-the-box, the container will have one network interface sitting on the docker0 bridge with an IP from 172.17.0.0/16 CIDR or whichever CIDR you have configured docker to use.

$ docker run --name box -it busybox
/ #

Now from another tab, run steps 1,2 and 3:

$ docker run --name box -it busybox
$ pid="$(docker inspect -f '{{.State.Pid}}' "box")"
$ echo $pid
2620
$ sudo mkdir -p /var/run/netns
$ sudo ln -s /proc/$pid/ns/net /var/run/netns/box
$ ip netns
box
$ ip netns exec box ip a
$ sudo ip netns exec box ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
7: eth0@if8: mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.2/16 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe11:2/64 scope link
valid_lft forever preferred_lft forever

Lets decipher the output and understand how linux maps all of the interfaces. What you see above (eth0@if8) is a network interface that is part of the veth pair. One end sits in the network namespace and the other sits in the docker0 bridge. There are multiple ways to identify the interface sitting in docker0. One way is to see the index of the interface in the network namespace. In our case eth0 has an index 7 (7: eth0@if8). The other end of the veth pair will always have the next index as linux stores all interfaces in an array. This arry and the indexes is global (system wide). So running a ip a command and grepping for “8: veth” will give you the other end of the veth pair. You can check which bridge it belongs to by running the brctl show command as shown below:

$ ip a | grep "8: veth"
8: vethb20d133@if7: mtu 1500 qdisc noqueue master docker0 state UP group default
$ brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.024258596b8d no vethb20d133
virbr0 8000.000000000000 yes 

Also note the interface name “veth or eth XXXX@ifZ”. The number Z also denotes the interface index of the other peer.

Packets flowing into the bridge & onto one veth end of the pipe in the bridge will be seen on the other veth end in the container’s network namespace. Since the container’s network stack is the network namespace itself the data is seen on the network interface within the container. Packets can be captured using tcpdump or other pcap tools on either ends of the veth pair for debugging.

Notes:

  1. A small piece of code to create a container with and without a shared network namespace –  here
  2. All docker commands shown above are as non-root. It is recommended to run docker as root. I’ve set up wheel and docker user to be able to run docker commands without having to be root.

Editorial note: this post was originally published by Arun on his personal blog: http://blog.arunsriraman.com/

Platform9

You may also enjoy

FinOps: Applying Earned Value Management to maximize ROI

By Chris Jones

Top 6 FinOps KPIs for EKS  

By Chris Jones

The browser you are using is outdated. For the best experience please download or update your browser to one of the following:

Leaving VMware? Get the VMware alternatives guideDownload now