DEV Community

jacobcrawford for IT Minds

Posted on

High availability Kubernetes cluster on bare metal - part 2

Last week we covered the theory of high availability in a bare-metal Kubernetes cluster, which means that this week is where the magic happens.

First of all, there are a few dependencies that you need to have installed to initialize a Kubernetes cluster. Since this is not a guide on how to set up Kubernetes, I will assume that you have already done this before, and if not you can use the same guide as I used when installing Kubernetes for the first time: guide.

Also, if you did not follow the guide and have already installed Kubernetes and Docker (or your favorite container runtime), you will also have installed a key Kubernetes toolbox kubeadm, which is what we will use to initialize the cluster. First, we need to deal with the problems of high availability, which we discussed last week.

The stable control plane IP

As mentioned, we will use a self-hosted solution where we set up a stable IP with HAProxy and Keepalived as pods inside the Kubernetes cluster. To achieve this, we will need to configure a few files for each master node:

  1. A keepalived configuration.
  2. A keepalived health check script.
  3. A manifest file for the keepalived static pod.
  4. A HAproxy configuration file.
  5. A manifest file for the HAProxy static pod.


! /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
    router_id LVS_DEVEL
vrrp_script check_apiserver {
  script "/etc/keepalived/"
  interval 3
  weight -2
  fall 10
  rise 2

vrrp_instance VI_1 {
    state ${STATE}
    interface ${INTERFACE}
    virtual_router_id ${ROUTER_ID}
    priority ${PRIORITY}
    authentication {
        auth_type PASS
        auth_pass ${AUTH_PASS}
    virtual_ipaddress {
    track_script {
Enter fullscreen mode Exit fullscreen mode

We have some placeholders in bash that we need to fill out manually or through scripting:

  1. STATE Will be MASTER for the node initializing the cluster because it will also be the first one to host the virtual IP address of the control plane.
  2. INTERFACE Is the network interface of the network where the nodes will communicate. For Ethernet connections, this is often eth0, and can be found with the command ifconfig on most Linux operating systems.
  3. ROUTER_ID Needs to be the same for all the hosts. Often set to 51.
  4. PRIORITY A unique number that decides which node should host the virtual IP of the control plane in case the first MASTER node goes down. Often set to 100 for the node initializing the cluster, and then decreasing values for the rest.
  5. AUTH_PASS should be the same for all nodes. Often set to 42.
  6. APISERVER_VIP The virtual IP for the control plane. This will be created.

For the health check script we have the following:


errorExit() {
    echo "*** $*" 1>&2
    exit 1

curl --silent --max-time 2 --insecure https://localhost:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://localhost:${APISERVER_DEST_PORT}/"
if ip addr | grep -q ${APISERVER_VIP}; then
    curl --silent --max-time 2 --insecure https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/"
Enter fullscreen mode Exit fullscreen mode

We see the APISERVER_VIP placeholder again, which is just the same as before. If some variables are repeated I will not repeat the explanation, which means that the only new variable is:

APISERVER_DEST_PORT, which is the front end port on the virtual IP for the API server. This can be any unused port e.g. 4200.

Last, the manifest file for Keepalived:

apiVersion: v1
kind: Pod
  creationTimestamp: null
  name: keepalived
  namespace: kube-system
  - image: osixia/keepalived:1.3.5-1
    name: keepalived
    resources: {}
        - NET_ADMIN
        - NET_RAW
    - mountPath: /usr/local/etc/keepalived/keepalived.conf
      name: config
    - mountPath: /etc/keepalived/
      name: check
  hostNetwork: true
  - hostPath:
      path: /etc/keepalived/keepalived.conf
    name: config
  - hostPath:
      path: /etc/keepalived/
    name: check
status: {}
Enter fullscreen mode Exit fullscreen mode

This creates a pod that uses the two configuration files.


We have one configuration file for the HAProxy:

# /etc/haproxy/haproxy.cfg
# Global settings
    log /dev/log local0
    log /dev/log local1 notice

# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except
    option                  redispatch
    retries                 1
    timeout http-request    10s
    timeout queue           20s
    timeout connect         5s
    timeout client          20s
    timeout server          20s
    timeout http-keep-alive 10s
    timeout check           10s

# apiserver frontend which proxys to the masters
frontend apiserver
    mode tcp
    option tcplog
    default_backend apiserver

# round robin balancing for apiserver
backend apiserver
    option httpchk GET /healthz
    http-check expect status 200
    mode tcp
    option ssl-hello-chk
    balance     roundrobin
        server ${HOST1_ID} ${HOST1_ADDRESS}:${APISERVER_SRC_PORT} check
        server ${HOST2_ID} ${HOST2_ADDRESS}:${APISERVER_SRC_PORT} check
        server ${HOST3_ID} ${HOST3_ADDRESS}:${APISERVER_SRC_PORT} check
Enter fullscreen mode Exit fullscreen mode

Here, we plug in the control plane IPs. Assuming a 3 node cluster we input a symbolic HOST_ID, which is just a unique name, for each as well as the HOST_ADDRESS. The APISERVER_SRC_PORT is by default port 6443, where the apiserver listens for traffic.

The last file is the HAProxy manifest file:

apiVersion: v1
kind: Pod
  name: haproxy
  namespace: kube-system
  - image: haproxy:2.1.4
    name: haproxy
      failureThreshold: 8
        host: localhost
        path: /healthz
        port: ${APISERVER_DEST_PORT}
        scheme: HTTPS
    - mountPath: /usr/local/etc/haproxy/haproxy.cfg
      name: haproxyconf
      readOnly: true
  hostNetwork: true
  - hostPath:
      path: /etc/haproxy/haproxy.cfg
      type: FileOrCreate
    name: haproxyconf
status: {}
Enter fullscreen mode Exit fullscreen mode

This is all we actually need to configure to get a cluster up and running. Some of these are constants that need to be the same for all three master nodes, and some need to vary between nodes. Some are just values you have to input and for some values, you have to make a decision.

Values sanity check

Let us just take a quick sanity check over the variables and what they are by default for each node.



Variables to input

MASTER for the node that initializes the cluster, BACKUP for the two others.
100 for the node that initializes the cluster, 99 and 98 for the two others.

Variables to retrieve

An IP within your network subnet. If your node has IP, this could be
A port for your choosing. Must not conflict with other service ports.

The network interface. Use ifconfig to find it.

Any unique name for each of the 3 master nodes.

The ip addresses of your machines. Can also be found with ifconfig on each machine.


Now that the files are configured they should be but in the right destination so that kubeadm can find them when the cluster is initializing.

The absolute file paths are:


Enter fullscreen mode Exit fullscreen mode

Putting manifest files into /etc/kubernetes/manifests/ is what does the magic here. Everything in this folder will be applied when the cluster initializes. Even the control plane pods that are generated by kubeadm will be put in here before the cluster initializes.

Initializing the cluster

When the files are in place, initializing the cluster is as simple as running the kubeadm init command with a few extra pieces of information.

kubeadm init --control-plane-endpoint APISERVER_VIP:APISERVER_DEST_PORT --upload-certs
Enter fullscreen mode Exit fullscreen mode

Will do the trick. The extra arguments tell the cluster that the control plane should not be contacted on the actual nodes IP, but on the virtual IP address. When the other nodes join, this is what makes the cluster highly available. If the node that is currently hosting the virtual IP goes down, the virtual IP will just jump to another available master node.

Last, join the other two nodes to the cluster with the join command output by kubeadm init.

If this even peaked your interest a little bit, you are in for a treat. The whole manual process is being eliminated in an open-source project right here. It is still a work in progress, but feel free to drop in and join the discussion.

Top comments (0)