DEV Community

Misha Bragin
Misha Bragin

Posted on • Originally published at netbird.io

Using eBPF and XDP to Share Default DNS Port Between Multiple Resolvers

Have you tried configuring a local DNS resolver to use a port different from the default one?
Changing port 53 can be tricky since the DNS protocol is usually bound to this port.
This is especially true if you need the software you develop to support as many operating systems as possible with various versions,
ensuring it works on different platforms, too.

The NetBird team recently faced this and other challenges while working on the DNS feature that makes managing DNS in private networks easy.
We spent a few days on a solution involving Go, eBPF, and XDP "magic" that allows sharing a single port between multiple DNS resolver processes.
We also created a DNS manager for WireGuard® networks that universally works on macOS, Linux, Windows, Docker, and mobile phones.
Now is the time to share our journey. Traditionally, as with the whole NetBird platform, the code is open-source.

About NetBird

Before jumping into the details, it is worth mentioning why we've dived into this topic and how NetBird simplifies DNS management in private networks.

NetBird is a zero-configuration overlay network and remote access solution that automatically connects your servers, containers,
cloud resources, and remote teams over an encrypted point-to-point WireGuard tunnel.
There is also something special about NetBird that gave us a "little" headache while working on the DNS support and forced us to use XDP.
I'm talking about kernel WireGuard. If available, NetBird uses the kernel WireGuard module shipped with Linux.
Otherwise, it falls back to the userspace wireguard-go implementation,
a common case for non-Linux operating systems.

To minimize the configuration effort on the administrator side, NetBird allocates network IPs from the CGNAT 100.64.0.0/10
range and distributes them to machines from a central place. Then, it configures a WireGuard interface locally on every machine,
assigning IPs to create direct connections.

NetBird WireGuard IP management

Here is how the WireGuard interface looks on my Ubuntu laptop after NetBird has done its configuration job:

misha@misha-linux:~$ ip address show wt0
5: wt0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1280 qdisc noqueue state UNKNOWN group default qlen 1000
    link/none
    inet 100.109.71.139/16 brd 100.109.255.255 scope global wt0
       valid_lft forever preferred_lft forever
Enter fullscreen mode Exit fullscreen mode

NetBird assigned the private IP 100.127.136.241 to the WireGuard interface wt0 on my machine. Similarly, it assigned private IPs to remote machines that my laptop is allowed to connect to (allowed ips):

misha@misha-linux:~$ sudo wg show
interface: wt0
public key: LAo1oYyliEnvENFRZJSVb93K1kOZFu627/KSxSYjWAw=
private key: (hidden)
listening port: 51820

peer: HzjQ27XQHhm3Ky2jrCDOOF63hgop/GS6zfzYEQmH6VA=
endpoint: 35.191.145.11:62519
allowed ips: 100.127.78.76/32
latest handshake: 24 seconds ago
transfer: 124 B received, 180 B sent
persistent keepalive: every 25 seconds

peer: Fz30t36HImRdt1nwVslVFxy2Q96OEB2SwCFYGeIHdmA=
endpoint: 34.173.121.152:35895
allowed ips: 100.127.58.115/32, 172.17.0.0/16
latest handshake: 25 seconds ago
transfer: 124 B received, 180 B sent
persistent keepalive: every 25 seconds
Enter fullscreen mode Exit fullscreen mode

Need for DNS in private networks

Now, what about DNS? Wouldn't it be much more convenient to have a meaningful and easily memorable name instead of the
100.127.16.22 IP address to access a privately hosted Postgres database hidden behind this IP?
To address this, NetBird automatically assigns a domain name to each peer in the private netbird.cloud space that one can use for remote access.

Running the netbird status -d command shows that my machine has connected to postgres.netbird.cloud and berlin-office.netbird.cloud:

misha@misha-linux:~$ netbird status -d
Peers detail:
postgres.netbird.cloud:
NetBird IP: 100.127.16.22
Public key: 99W1Y7p3QPesQpEYu7i0mXlvo+jBhv4DzvY7apPDDBM=
Status: Connected
-- detail --
Connection type:P2P
Last connection update: 2023-12-13 11:47:56

berlin-office.netbird.cloud:
NetBird IP: 100.127.58.115
Public key: Fz30t36HImRdt1nwVslVFxy2Q96OEB2SwCFYGeIHdmA=
Status: Connected
-- detail --
Connection type: P2P
Last connection update: 2023-12-13 11:43:32
Enter fullscreen mode Exit fullscreen mode

The ping command successfully resolves postgres.netbird.cloud to 100.127.16.22:

misha@misha-linux:~$ ping -c 3 postgres.netbird.cloud
PING postgres.netbird.cloud (100.127.16.22) 56(84) bytes of data.
64 bytes from 100.127.16.22: icmp_seq=1 ttl=64 time=34.6 ms
64 bytes from 100.127.16.22: icmp_seq=2 ttl=64 time=35.6 ms
64 bytes from 100.127.16.22: icmp_seq=3 ttl=64 time=33.0 ms
Enter fullscreen mode Exit fullscreen mode

Local DNS resolution implementation

How does NetBird do it so that DNS resolution works on Linux, Mac, Windows, mobile phones, and even inside Docker? While
working on the feature, we had a few options. One was to set up a centralized DNS server, e.g., dns.netbird.io,
and configure local resolvers on all connected machines to point to it.
However, such an approach brings scalability, performance, and, more importantly, privacy issues.
Like network traffic, we prefer our users' DNS queries not to go through our servers.

Another approach is to modify the local /etc/hosts file on every machine, specifying the remote machines' name-IP pairs.
The hosts file can quickly outgrow when dealing with large networks, and who would like it if NetBird modified this file?

We took the locality approach and avoided significant system configuration changes. Every NetBird agent starts an embedded
DNS resolver that holds the NetBird name-IP pairs of accessible machines in memory and resolves the names when requested.
The centralized NetBird management service sends DNS updates (essentially name-IP pairs) through the control channel
when new machines join or leave the network or the administrator changes DNS settings.

NetBird DNS management

We applied this approach to all supported operating systems. However, there is a caveat.
The NetBird agent must also configure the machine's operating system to point all netbird.cloud queries to the local NetBird resolver.

Configuring DNS management varies from operating system to operating system and even version to version.
Add kernel WireGuard to it, and you will get some interesting implementation differences. Read on, and you will finally get to the port issue!

Configuring DNS on Linux

Linux is the most used OS in NetBird. The great variety of Linux flavors and versions make the administration,
for lack of a better word, unpleasant. DNS configuration is not the exception - Linux offers different ways to do it.
Depending on the system setup, you could use the NetworkManager service,
systemd-resolved service,resolvconf command line tool, and a direct modification of the resolv.conf file.

Which one does NetBird use? As mentioned earlier, we avoid causing significant changes to the system configuration.
Therefore, NetBird checks the /etc/resolv.conf file that usually contains a comment indicating how the system manages DNS:

misha@misha-linux:~$ head -n 2 /etc/resolv.conf
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
Enter fullscreen mode Exit fullscreen mode

My Ubuntu system uses the systemd-resolved service. There is also a clear warning not to edit; we follow the advice.

With a bunch of "if a string contains" conditions, NetBird chooses the right manager.

After picking the manager, NetBird starts configuring DNS.
It uses D-Bus for communicating with NetworkManager and systemd-resolved, os.exec with resolvconf, and os.WriteFile for a direct /etc/resolv.conf file modification:

DNS configuration tools

If you are familiar with the Go programming language, look at the open-sourced code calling the systemd-resolved service via D-Bus.

Below is the command-line equivalent of the Go code that uses Ubuntu's default bustctl utility to call D-Bus:

busctl call org.freedesktop.resolve1 \
  /org/freedesktop/resolve1/link/_312 \
  org.freedesktop.resolve1.Link \
  SetDNS 'a(iay)' 1 2 \
  4 100 127 136 241
Enter fullscreen mode Exit fullscreen mode

What does the configuration result look like? When I run the resolvectl status command on my machine,
the applied DNS configuration appears as follows:

# wt0 is the WireGuard interface that NetBird created on the machine
misha@misha-linux:~$ resolvectl status wt0
Link 12 (wt0)
    Current Scopes: DNS
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 100.127.136.241
       DNS Servers: 100.127.136.241
        DNS Domain: netbird.cloud
Enter fullscreen mode Exit fullscreen mode

The agent reused its NetBird private IP 100.127.136.241 and configured the system's DNS to resolve netbird.cloud
with the embedded resolver listening on the default DNS port 53.

Port 53 issue

The setup described above looks straightforward, doesn't it? The NetBird agent started a local resolver,
received a configuration from the management server, and applied it to the system. However, there are always exceptions.
Some of them are very common. Other processes like dnsmasq, Unbound, or Pi-hole might already occupy the default DNS port.
On top of that, some systems won't allow administrators to specify a custom port. Support for custom DNS ports was added
to the systemd-resolved service only in 2020,
while resolv.conf file doesn't support it to date. What to do in this situation?
NetBird pursues the goal of a networking platform that works anywhere, universally, and without complex configurations.
Therefore, we had to apply some logic to work around the issue and came up with two solutions to most of the corner cases.

Solution: unassigned loopback address

The NetBird-embedded DNS resolver tries to listen for incoming DNS queries on the private NetBird IP or on the loopback address and port 53,
e.g., 100.127.136.241:53 and 127.0.0.1:53, respectively. As we discovered, this may not be possible due to some other process occupying port 53.
NetBird yields to an unassigned loopback address, 127.0.0.153:53 in this case.
The Linux operating system will allow a combination of an unassigned loopback address and port even though it is occupied.

Below is a small Go program that simulates the situation and tries listening on 127.0.0.1:53 and then on 127.0.0.153:53:

func main() {
    // array of IP:port combinations
    addresses := []string{"127.0.0.1:53", "127.0.0.153:53"}

    // create a listener for each address
    for _, address := range addresses {
       time.Sleep(time.Second)
       go func(addr string) {
          packetConn, err := net.ListenPacket("udp", addr)
          if err != nil {
             fmt.Printf("failed creating UDP listener on %s: %v\n", addr, err)
             os.Exit(1)
          }
          defer packetConn.Close()
          fmt.Printf("successfully created UDP listener on %s\n", addr)

          // handle packets
          for {
             buffer := make([]byte, 1024)
             _, _, err = packetConn.ReadFrom(buffer)
             if err != nil {
                fmt.Printf("error reading packet: %v\n", err)
                continue
             }
          }
       }(address)
    }

    // Keep the main goroutine running
    select {}
}
Enter fullscreen mode Exit fullscreen mode

After compiling and running the code, we see that it worked without issues:

misha@misha-linux: go build -o test && sudo ./test
successfully created a UDP listener on 127.0.0.1:53
successfully created a UDP listener on 127.0.0.153:53
Enter fullscreen mode Exit fullscreen mode

Netstat also indicates that there are two go processes listening on port 53:

misha@misha-linux:~$ sudo netstat -tulpn | grep 53
udp        0      0 127.0.0.153:53          0.0.0.0:*          24356/./test
udp        0      0 127.0.0.1:53            0.0.0.0:*          24356/./test
Enter fullscreen mode Exit fullscreen mode

By the way, the systemd-resolved service has a similar behavior and listens on 127.0.0.53:53.

That was a theory, but what does it look like in real life?

100.127.136.241:53 address, a private NetBird IP, may not be available because of the dnsmasq service that listens
on all interfaces by default (omitting ipv6 for simplicity):

tcp        0      0 172.17.0.1:53           0.0.0.0:*         LISTEN      26136/dnsmasq
tcp        0      0 127.0.0.1:53            0.0.0.0:*         LISTEN      26136/dnsmasq
tcp        0      0 100.127.136.241:53      0.0.0.0:*         LISTEN      26136/dnsmasq
udp        0      0 127.0.0.1:53            0.0.0.0:*                     26136/dnsmasq
udp        0      0 100.127.136.241:53      0.0.0.0:*                     26136/dnsmasq
udp        0      0 172.17.0.1:53           0.0.0.0:*                     26136/dnsmasq
Enter fullscreen mode Exit fullscreen mode

How about 127.0.0.1:53? Similar to dnsmasq another DNS service unbound occupies this address:

tcp        0      0 127.0.0.1:8953          0.0.0.0:*         LISTEN      27023/unbound
tcp        0      0 127.0.0.1:53            0.0.0.0:*         LISTEN      27023/unbound
udp        0      0 127.0.0.1:53            0.0.0.0:*                     27023/unbound
Enter fullscreen mode Exit fullscreen mode

Switching to 127.0.0.153:53 should do the trick in most cases. But what if some process listens on all interfaces?

When a process listens on all interfaces, in other words, 0.0.0.0:53,
Linux won't allow using unassigned loopback addresses.
We can quickly test this scenario by modifying the address parameter in the Go testing code by placing 0.0.0.0:53 first:

addresses := []string{"0.0.0.0:53", "127.0.0.1:53", "127.0.0.153:53"}
Enter fullscreen mode Exit fullscreen mode

The code fails with the address already in use error:

misha@misha-linux: go build -o test && sudo ./test
successfully created a UDP listener on 0.0.0.0:53
failed creating a UDP listener on 127.0.0.1:53: bind: address already in use
Enter fullscreen mode Exit fullscreen mode

That could happen when network administrators apply custom configurations to the system.
NetBird uses eBPF with XDP (eXpress Data Path) to work around this issue and share the port with the other process.

Solution: port forwarding with XDP and eBPF

From the documentation: XDP is a framework that enables high-speed packet processing within eBPF applications.
To enable faster response to network operations,
XDP runs a eBPF program as soon as possible, immediately as the network interface receives a packet. Here is how we use it in NetBird:

DNS port forwarding in NetBird with XDP

Shortly, the agent initiates an embedded NetBird DNS listener on a custom port 5053 and NetBird IP 100.127.136.241,
that the NetBird management service assigned to the peer.
However, due to issues encountered when using custom ports, the agent configures the system to reroute all
netbird.cloud queries to a "fake" resolver address using the peer's NetBird IP 100.127.136.241:53.
This address is referred to as "fake" because there is no active listener on it. Subsequently,
the eBPF program intercepts network packets destined for the fake address and
forwards them to port 5053, where the real embedded NetBird DNS resolver listens.

We developed the eBPF program in C with a core function xdp_dns_fwd shown below and attached it
to the loopback interface lo with XDP.
This function modifies the destination address of incoming DNS packets when it matches the dns_ip.
Specifically, it transforms the address 100.127.136.241:53 to 100.127.136.241:5053. Additionally,
it handles outgoing DNS packets returning from the NetBird resolver to DNS clients by altering the source address.
In this case, 100.127.136.241:5053 is changed to 100.127.136.241:53.

int xdp_dns_fwd(struct iphdr  *ip, struct udphdr *udp) {
    if (dns_port == 0) {
        if(!read_settings()){
            return XDP_PASS;
        }
    }

    // For DNS request processing:
    // - Incoming DNS packets going to,
    //   for example, 100.127.136.241:53 (a 'fake' address),
    //   are redirected to the NetBird resolver at 100.127.136.241:5053.
    if (udp->dest == GENERAL_DNS_PORT && ip->daddr == dns_ip) {
        udp->dest = dns_port;
        return XDP_PASS;
    }

    // For DNS response processing:
    // - The source port 5053 of the NetBird resolver's response
    //   is replaced with port 53.
    //   E.g., 100.127.136.241:5053 becomes 100.127.136.241:53.
    //   This is necessary for the client to accept the response.
    if (udp->source == dns_port && ip->saddr == dns_ip) {
        udp->source = GENERAL_DNS_PORT;
        return XDP_PASS;
    }

    return XDP_PASS;
}
Enter fullscreen mode Exit fullscreen mode

Why did we attach the XDP program to the loopback interface but not the WireGuard interface wt0?
The reason is simple: the DNS resolution we are performing is local. The system understands this and optimizes the flow by using the loopback interface.
That is also why we can modify the DNS response even though XDP only works on ingress traffic.
Running tcpdump while pinging postgres.netbird.cloud demonstrates this behaviour:

misha@misha-linux:~$ sudo tcpdump -i any -n 'udp and (port 53 or port 5053)'
15:17:53.581324 lo    In  IP 100.127.136.241.59049 > 100.127.136.241.5053: UDP, length 44
15:17:53.581675 lo    In  IP 100.127.136.241.53 > 100.127.136.241.59049: 34021 0/0/0 (44)
Enter fullscreen mode Exit fullscreen mode

You may have also noticed a call to the read_settings() function. We use it to make the function more flexible and pass parameters from the Go code. It helps us configure the "fake" address with the dns_ip and dns_port properties that the eBPF program uses to "catch" DNS packets.
We achieved this by using BPF maps
to pass these parameters from the userspace Go application to the eBPF program.

bool read_settings() {
    __u16 *port_value;
    __u32 *ip_value;

    // read dns ip (configured fake address)
    ip_value = bpf_map_lookup_elem(&nb_map_dns_ip, &map_key_dns_ip);
    if(!ip_value) {
        return false;
    }
    dns_ip = htonl(*ip_value);

    // read dns port (configured real active listener port. e.g. 5053)
    port_value = bpf_map_lookup_elem(&nb_map_dns_port, &map_key_dns_port);
    if (!port_value) {
        return false;
    }
    dns_port = htons(*port_value);
    return true;
}
Enter fullscreen mode Exit fullscreen mode

To put it all together and use the function in Go, we've built and compiled the program with bpf2go,
which is a component of the Cilium ecosystem.
The command generates Go helper files that contain the eBPF program's bytecode.

# required packages libbpf-dev, libc6-dev-i386-amd64-cross
go run github.com/cilium/ebpf/cmd/bpf2go \
    -cc clang-14 \
    bpf src/prog.c \
    -- \
    -I /usr/x86_64-linux-gnu/include
Enter fullscreen mode Exit fullscreen mode

The generated files can be used to load eBPF functions and pass parameters via a BPF map:

func (tf *GeneralManager) LoadDNSFwd(ip string, dnsPort int) error {
    log.Debugf("load ebpf DNS forwarder, watching addr: %s:53, redirect to port: %d", ip, dnsPort)
    tf.lock.Lock()
    defer tf.lock.Unlock()

    err := tf.loadXdp()
    if err != nil {
       return err
    }

    err = tf.bpfObjs.NbMapDnsIp.Put(mapKeyDNSIP, ip2int(ip))
    if err != nil {
       return err
    }

    err = tf.bpfObjs.NbMapDnsPort.Put(mapKeyDNSPort, uint16(dnsPort))
    if err != nil {
       return err
    }

    tf.setFeatureFlag(featureFlagDnsForwarder)
    err = tf.bpfObjs.NbFeatures.Put(mapKeyFeatures, tf.featureFlags)
    if err != nil {
       return err
    }
    return nil
}

Enter fullscreen mode Exit fullscreen mode

You can find the complete eBPF code in our GitHub repository at the following link: GitHub Repo - netbirdio/netbird.

Conclusion

We didn't expect to use technologies like eBPF and XDP when we started working on the DNS feature, nor did we think there would be so many edge cases.
The main reason for the complexity was our integration with the kernel WireGuard, where NetBird configures the interface and steps aside.
NetBird doesn't have direct access to network packets in this mode, unlike the userspace wireguard-go implementation,
where the DNS packets can be processed directly in Go. Therefore, the userspace mode doesn't require port forwarding with XDP and eBPF.

We intentionally picked the harder path, not just because we enjoy challenges but also because the kernel WireGuard mode
offers high network performance, security, and efficiency. We are committed to bringing this value to our cloud and open-source users.

As for the userspace implementation, the NetBird DNS feature also works on Windows, macOS, and mobile phones where we used wireguard-go.
How does it work there? We will cover this in a separate article.
Meanwhile, try NetBird in the cloud with the QuickStart Guide or self-host it on your servers with the Self-hosting Quickstart Guide.

Top comments (0)