This blog post is the first in a series concerning deploying a high available Kubernetes cluster on bare metal.
Kubernetes is often used in the comfort of cloud providers, where we can spin up multiple master nodes without even caring about what goes on behind the scene. In this blog post, we will step away from the comforts of cloud-provided architecture and into the dangerous unknown environment of a bare-metal infrastructure.
Using Kubernetes is great and comfortable because you don't have to worry about failing services, discomforts of scaling, downtimes, version upgrades and so much more. Though as always when working with distributed technologies you should count on machine failures, and this is where the term high availability comes up.
When using Kubernetes in a playground environment, we spin up a single master node and run our containers. In a production environment, this is not optimal for a number of reasons, but mostly because it is a single point of failure. Hence, we want a Kubernetes cluster to have multiple master nodes to deal with this problem. This is where it gets tricky on bare metal.
Taking a step back, we need to understand what goes on in a Kubernetes master node. Normally containers only run on worker nodes and Kubernetes handles deployments, replications, pod communications, etc. If we look into the belly of a master node it contains pods for orchestration:
etcd A key-value storage for cluster data, most importantly the desired state
kube-scheduler A component that watches for newly created pods and decides which node they should run on
kube-controller-manager Manages a lot of Kubernetes functionality like listening for nodes going down, pod creation, etc.
kube-apiserver The frontend that handles communications with all the other pods.
These 4 pods constitute what is commonly known as the control plane of a Kubernetes cluster. As stated, all communication goes through the kube-apiserver. When we execute
kubectl commands what happens behind-the-scene is that we send post requests to the kube-apiserver.
The kube-apiserver is also where our problem arises when introducing multiple master nodes. When there are multiple master nodes, what kube-apiserver should I contact? If I just pick my favorite and it dies, what happens?
The Kubernetes control plane is an abstraction and users should not have to worry about contacting different kube-apiservers and dealing with the fact that their favorite kube-apiserver might disappear and another one might spin up. This is why Kubernetes will not even let you join multiple master nodes to the same cluster, without handling this problem first.
Users of the cluster should only know one IP address to contact, and this IP address should be stable.
A cloud provider will simply hand you a load balancer with a stable IP address and you are good to go, but this is a manual process in a bare metal setup.
So we need to set up a stable IP for the Kubernetes control plane, a job that can be done in a few ways.
We set up a virtual IP for the control plane by installing HAProxy and Keepalived. In short, HAProxy uses ARP to broadcast that a virtual IP address should be translated to the physical machine's MAC address. This means that when anyone wants to contact the IP address we set up for the control plane the traffic will be redirected to this physical machine. The Keepalived service ensures that this machine can always be contacted, and if it can't the virtual IP will switch to another machine. This makes the virtual IP stable and can therefore be used as the IP address for the control plane.
This approach is great and simple but depending on the implementation we might get into trouble. If we install it besides Kubernetes on our machine, what happens if the HAproxy or Keepalivd service fails? The whole master node will then be considered down, and because we manually need to go in and restart the service, we lose the orchestration benefits of Kubernetes. Well, then let us install it as pods inside Kubernetes. Then if one of the services fails, Kubernetes will just bring them back up again.
Sadly this introduces a chicken and egg situation:
- Setup a stable IP in HAProxy and Keepalived to initiate a highly available Kubernetes cluster.
- Setup a Kubernetes cluster so that you can host HAproxy and Keepalived in pods.
Fortunately, I lied when I said that all communication to the cluster goes through the kube-apiserver. All the pods in the control plane cannot go through the normal process of contacting the kube-apiserver to get deployed, well because the kube-apiserver is not deployed yet. They are what is known as static pods and get deployed simply by putting their YAML files in the right folder. In Linux this is /etc/kubernetes/manifests/ by default. All YAML files in this folder will get deployed with the control plane when we initialize our cluster. This seems like it solved our chicken and egg problem.
Well, now we have gone through the theory needed to deploy a highly available Kubernetes cluster on bare metal. This means that we can spin up a cluster on a few of our old laptops or raspberry pis.
If this peaked your interest, follow along next week when I actually show how easy it is to do this in practice.