DEV Community

Cover image for I wrote a distributed ZeroTier like things in three weeks
Toby Chui
Toby Chui

Posted on

I wrote a distributed ZeroTier like things in three weeks

Recently I am working with my soon 4 years old project called ArozOS in my spare time. If you didn't read my previous post about ArozOS, it is a web desktop system designed for cloud / private server environments. You can read more in the link below:

In a chat session with my coauthor from US, we discuss the possibility of implementing a way to connect multiple ArozOS nodes (mostly Raspberry Pis) into a large storage and compute pool. But as ArozOS has some design guidelines like no 3rd party run-time dependencies and all core system must be static and fail safe, this makes it not possible to work with existing solutions. So this is where this project kicks off.

Why not ZeroTier or VPN?

Many of my dev friends see this project will ask me this question. In fact, ZeroTier, Hamachi or other SDWAN solutions can provide the answer for my use case easily if it was not designed to run under the ArozOS project. The problem with ZeroTier and others is that they relies on a central server and runs on a tree network structure. These structures are not designed to prevent single point of failure (SPOF), they are designed to increase the user dependencies on their services. So, I need to implement my own to solve this issue without relying on structure that contains SPOF.

Image description
A brief description on how ZeroTier network works

GitHub logo tobychui / go-DDDNS

Go implementation of the Rouring ID based Distributed Dynamic Domain Name Service

go-DDDNS

Go implementation of the Routing ID based Distributed Dynamic Domain Name Service

Introduction

There is no such standard as DDDNS and this is not a DDNS protocol. I drafted this experiment myself to test if it is possible to create a clusters that "floats" on the floating IP assigned by the ISP or in a network environment that can only allow plug-and-play hosting.

The requirement of this project is as follows.

  1. The software cannot get IP address from its NIC, or using any kind of commands that request OS to provide the IP address information (e.g. No ip a or ifconfig)
  2. No external dependencies other than the connected nodes (e.g. no UPnP to ask router what is the current node IP, no online IP checking API)
  3. No platform dependencies (aka it should not be only working on Linux)

All of the ip address information has to be come from…

What Is This?

This project was code-named go-DDDNS (I should rename it to something else later as it is technically not a DNS, but for now please stick with this first), it is a distributed, no Single Point of Failure implementation of a routing ID based IP resolver for a cluster of nodes "floating on the sea of internet".

The first thing you want to connect a few computers together is you need to know where to find them. "Where" in term of the internet is IP address. So I would need to create a way for all the nodes within the network to know their reachable IPs. This is easy as in daily implementations, you can always just send a GET request to AWS web service or IPIFY.org and let the 3rd party server return your public IP address. As mentioned, ArozOS do not allow run-time 3rd party dependencies. So I need to design a protocol for each nodes to know its public IP address from other connected nodes. And this is what I came up with

How it Works?

Setup

This part was not actually related to the algorithm. Just before heart beat of nodes happens, each nodes exchange their TOTP secret over HTTPS to make sure the sender and receiver are both registered to each other, just a tiny security feature, but worth mentioning here.
Image description

Heartbeat

Heartbeat is commonly used in automatic fail-over systems where two server nodes, usually one primary and one secondary, keep heart-beating to each other to make sure the other node is online. In case of a heart beat request failed, the other node will replace the first node and continue serving your requests. However, heart beating to a remote node would also carries information of the request routing, like the source IP. Hence, by utilizing the source IP field in the packet, we can know where the request was sent from. After the remote nodes analysis the sender's (public) IP address, the remote node will return the sender IP address as seen from the remote node to the sender, and now we got a public IP address returned!
Image description

Oh Wait, Dynamic IP !?

As the world is running low in IPv4 address, ISP now serving their clients with dynamic IP address. So in order to handle IP change of nodes that is set-up using home networks, the protocol need to handle IP changing and updating as well. Luckily, as the heart beat is purely implemented based on remote node's view point of sender node's IP address, if the sender node's IP changed, the remote node also get the updated IP address of the sender node just by looking at its packet header. So IP change of a particular node in the network wont' be a big problem.
Image description

Here is a quick demo video showcasing a node changing IP during run time and the other node is able to catch the IP update in next cycle (10 second) and update the heartbeat endpoint.

Oh Wait, Two node IP change at the same time!??

However when you got more nodes, IP changing might occur in the same heartbeat cycle on multiple nodes. And based on the above method of updating IP address based on heart beat request, there might be a situation where two nodes, in which the two nodes change IP at the same time / within the same heartbeat cycle, will never able to reach each other again as both node's heart beat is not reachable on both side.
Image description

So, that is why a synchronization protocol is added in the module. After a fixed number of retries (Might change this to a random range instead later for better performance), it will try to get the "lost node" IP address from a randomly picked nearby static nodes (a node which IP didn't change recently) / most recently online node.
Image description

This way, after a few cycles of iterations, the lost node should established an updated IP address and heartbeat connection to the "static node", where the two IP changing node can update their "lost node" IP address from the "static nodes"

Here is a short video demonstrating 3 nodes, where 2 nodes IP are manually changed at the same time during run time.

As you can see, both nodes grab the other's IP address after a few retry count, and successful continue the heart beat process with an updated IP address.

What Next?

I guess the next stage for this project is add NAT node support via websocket / other dual way communication protocols as well as supporting iterative registration of nodes (i.e. register node E on node A will also register the node E on B, C, D nodes using iterative logic). Right now it can only track IP address of network nodes, but later I hope I can continue build on top of this project and create a transport layer and add clustering features to my ArozOS projects :)

Visit my Github repo for more details and source code. Feel free leave a star if you like it or get inspired by this project ⭐⭐⭐

GitHub logo tobychui / go-DDDNS

Go implementation of the Rouring ID based Distributed Dynamic Domain Name Service

go-DDDNS

Go implementation of the Routing ID based Distributed Dynamic Domain Name Service

Introduction

There is no such standard as DDDNS and this is not a DDNS protocol. I drafted this experiment myself to test if it is possible to create a clusters that "floats" on the floating IP assigned by the ISP or in a network environment that can only allow plug-and-play hosting.

The requirement of this project is as follows.

  1. The software cannot get IP address from its NIC, or using any kind of commands that request OS to provide the IP address information (e.g. No ip a or ifconfig)
  2. No external dependencies other than the connected nodes (e.g. no UPnP to ask router what is the current node IP, no online IP checking API)
  3. No platform dependencies (aka it should not be only working on Linux)

All of the ip address information has to be come from…

Discussion (0)