DEV Community

Cover image for 🥇 The ultimate kubernetes homelab setup
Justin
Justin

Posted on

🥇 The ultimate kubernetes homelab setup

Overview

Want an easy way to deploy and control kubernetes in a totally sealed homelab environment?

Let's get started with k3, gitlab, proxmox, ansible and terraform to automate our homelab infrastructure for repeatability and expandability.

Sections

Some of the sections we're going to cover for build out:

  • proxmox
  • terraform
  • ansible
  • k3
    • dns
    • test deploys
  • gitlab
  • sshportal
  • persistant volumes

Proxmox

The baseline for this project is going to be a proxmox server. We could run k3 bare metal, but proxmox gives us the ability to practice spinning up and down k8 nodes of different sizes/restraints via terraform. Plus if anything goes wrong, we just destroy a proxmox vm and start fresh, which happens in minutes rather then having to reinstall an entire OS on baremetal.

We're going to need/want to

  • Setup a cloud-init image to be our base image for terraform
  • Network a remote cluster via tailscale (vpn) [optional redundancy]

So with proxmox, there's not much to it other then to follow the installation instructions: https://www.proxmox.com/en/proxmox-ve/get-started

  • Download ISO image
  • Boot from USB or CD/DVD
  • Configure the host machine (make sure you can login web ui)

Once you have it so you can log into a web ui with something like

https://[your_ip_here]:8006/
Enter fullscreen mode Exit fullscreen mode

Image description

You can decide if you want to setup the remote cluster, one day I'll document how to network the two with tailscale.

Cloud init

With proxmox setup, we need setup a base image to be our "golden image" for terraform to use to deploy new nodes etc.

For that, I'd suggest following a tutorial like this:
https://www.youtube.com/watch?v=shiIi38cJe4

https://docs.technotim.live/posts/cloud-init-cloud-image/

I followed the tutorial pretty closely and even used the suggested image id of 8000. So you'll see that being used in the next terraform section.

Image description

Cloud init gives us the ability to spin up a base image with predefined username/password, accepted ssh keys, and ip configuration. As well as gives us the ability to override those settings using the terraform provider.

So with all that, let's jump into the terraform section.

Terraform

So now that our local has it's own "cloud", i.e. we have proxmox instead of AWS or GCP for our "unlimited" VMs, we need a way to orchestrate them.

Part of the big switch to kubernetes is, it enables us to not care about the underlying infrastructure. For example, on AWS, if we have a kubernetes node that runs out of diskspace, we can expect aws to launch a new node and kubernetes move the current workload to the new node.

Terraform is what will enable us to recreate some of that underlying node scalability via proxmox.

Ultimately we'll have something like var.node_count = 3, let's say we want to deploy more pods but we've reached the max kubernetes will deploy to our current 3 cluster setup. We can change to var.node_count = 5 and let terraform handle rolling that out across proxmox and ensure everything is kept uniform.

Image description

Terraform gives the ability to easily create and manage a grouping of different size nodes for experimenting with changing workloads or setting certain node affinity on kubernetes.

Some of the groups I've made so far are like the following:

variable "prxmx_api_nodes" {
  default = 1
}

variable "prxmx_worker_nodes" {
  default = 3
}

variable "prxmx_worker_xl_nodes" {
  default = 1
}

variable "prxmx_worker_xxl_nodes" {
  default = 1
}
...
Enter fullscreen mode Exit fullscreen mode

That enables the following types of workgroups that scale to certain resources:

name cores ram hd space hd type
worker_nodes 4c 4G 30G hdd
worker_xl_nodes 12c 12G 80G hdd
worker_ssd_xl_nodes 12c 12G 80G ssd
worker_xxl_nodes 24c 24G 160G hdd
worker_ssd_xxl_nodes 24c 24G 160G ssd

In my setup, I have "xeon" 2 socket, 32 core proxmox server, 96gb ram, and 12 core intel i7, 16gb ram "river" server.

Terraform has to know to deploy these different VMs across both river and xeon proxmox servers, but once those nodes get added to kubernetes (k8s), the implementation is abstracted from k8s. K8s will deploy across any/all available nodes.

But having a good selection of different types of nodes really helps it feel like a full replacement for AWS. And even if you don't use all the different workgroups, having them available but scaled back definitely helps for organization and scalability once it's needed.

Most of my workload might run on worker_xl_nodes which that workgroup can exist across both my river and xeon servers, so if I need to scale down my xeon server, kubernetes can migrate everything to the river server for maintenance.

But some applications need lots of cores and ram that river doesn't have, so without having another large xeon like server, scaling down my worker_xxl_nodes is going to result in downtime.

Ansible

One of the other things we'll also do here is use ansible to setup each of our nodes to our liking via provisioner "local-exec"

resource "proxmox_vm_qemu" "api_nodes" {
  count       = var.prxmx_api_nodes
  name        = "k3-api-${count.index + 1}"
  desc        = "k3-api-${count.index + 1}"
  target_node = "xeon"

  clone = "ubuntu-cloud"

  os_type      = "cloud-init"
  ipconfig0    = "ip=10.0.4.${count.index + 1}/16,gw=10.0.2.1"
  nameserver   = "1.1.1.1"
  searchdomain = "1.1.1.1"

  cores        = 4
  memory       = "4096"

  disk {
    storage = var.disk
    type    = "scsi"
    size    = "30G"
  }

  lifecycle {
    ignore_changes = [
      ciuser,
      sshkeys,
      network,
    ]
  }

  provisioner "local-exec" {
    command = "sleep 30 && ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -u knox -i '10.0.4.${count.index + 1},' playbook.yml"
  }
}
Enter fullscreen mode Exit fullscreen mode

What this means is, after each successful setup of our proxmox vm, we'll run a ansible playbook to install the tool to our liking.

Whether that's k3 itself, or neovim and exporting the editor to use neovim on each VM. Ansible enables that.

K3

So with our hypervisor (proxmox) chosen, vm layout designed (terraform + ansible), we're ready to start on the kubernetes cluster, which is the thing that will handle our actual workloads/applications.

For this iteration, I'm using k3 for its total ease of use.

Using terraform, I deploy one api server (high-availability to come later) with terraform something like

# cat main.tf

terraform {
  required_providers {
    proxmox = {
      source  = "Telmate/proxmox"
      version = "2.9.11"
    }
  }
}

provider "proxmox" {
  pm_api_url      = "https://10.0.2.3:8006/api2/json"
  pm_tls_insecure = true
}

variable "prxmx_api_nodes" {
  default = 1
}

variable "disk" {
  default = "hdd-12tb"
}

resource "proxmox_vm_qemu" "api_nodes" {
  count       = var.prxmx_api_nodes
  name        = "k3-api-${count.index + 1}"
  desc        = "k3-api-${count.index + 1}"
  target_node = "xeon"

  clone = "ubuntu-cloud"

  os_type      = "cloud-init"
  ipconfig0    = "ip=10.0.4.${count.index + 1}/16,gw=10.0.2.1"
  nameserver   = "1.1.1.1"
  searchdomain = "1.1.1.1"

  cores        = 4
  memory       = "4096"

  disk {
    storage = var.disk
    type    = "scsi"
    size    = "30G"
  }

  lifecycle {
    ignore_changes = [
      ciuser,
      sshkeys,
      network,
    ]
  }

  provisioner "local-exec" {
    command = "sleep 30 && ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -u knox -i '10.0.4.${count.index + 1},' playbook.yml"
  }
}
Enter fullscreen mode Exit fullscreen mode

And a playbook like

# cat playbook.yml

---

- name: Install k3
  gather_facts: true
  hosts: all
  tasks:
    - name: Install k3
      shell: curl -sfL https://get.k3s.io | sh -
Enter fullscreen mode Exit fullscreen mode

Once it's deployed, in my case it'll deploy it to a static ip 10.0.4.1, so I'll ssh to it, and grab the k3 token to add future nodes to the k3 cluster.

# cat /var/lib/rancher/k3s/server/token
K100.......
Enter fullscreen mode Exit fullscreen mode

So now I spin up some worker_xl_nodes, once they're finished, I ssh in and run

curl -sfL https://get.k3s.io | K3S_URL=https://10.0.4.1:6443 K3S_TOKEN='K100.....' sh -
Enter fullscreen mode Exit fullscreen mode

Once that's finished and I have the workers I need, can verify with

# kubectl get nodes

NAME              STATUS   ROLES                  AGE   VERSION
k3-worker-xxl-1   Ready    <none>                 12h   v1.25.3+k3s1
k3-worker-xl-1    Ready    <none>                 12h   v1.25.3+k3s1
k3-worker-3       Ready    <none>                 14h   v1.25.3+k3s1
k3-worker-2       Ready    <none>                 14h   v1.25.3+k3s1
k3-api-1          Ready    control-plane,master   14h   v1.25.3+k3s1
k3-worker-1       Ready    <none>                 14h   v1.25.3+k3s1
Enter fullscreen mode Exit fullscreen mode

DNS

Now one other thing is, I setup a DNS A record to point

*.kube.reaz.io -> 10.0.4.1
Enter fullscreen mode Exit fullscreen mode

Because I use tailscale for everything, I have no need for exposing the IP to the outside world.

Test deploys

One other thing I'll often use is this little deploy.sh helper script for quickly slinging out an docker image to an entire k8s deployment.

# cat deploy.sh

#!/bin/bash

image=$1
name=$2
port=$3

kubectl create deployment $name --image=$image
kubectl expose deployment $name --port=80 --target-port=$port --name $name --type=LoadBalancer
kubectl create ingress $name --rule="$name.kube.reaz.io/*=$name:80"
Enter fullscreen mode Exit fullscreen mode

The usage is like

## Example: sh deploy.sh [image] [name] [port]

# sh deploy.sh linuxserver/nginx nginx 80

# curl nginx.kube.reaz.io
    <html>
        <head>
            <title>Welcome to our server</title>
...
Enter fullscreen mode Exit fullscreen mode

Gitlab

With that in place, we're ready to deploy a test gitlab env. I like to use gitlab as a self-hosted/self-contained git for my local stuff that I don't want on the public web (mainly for things that aren't serious, just a nice playground).

But I do like keeping things in source control and using pipelines where possible.

So using the deploy script earlier, we'll just use the quick and easy docker image for right now and until we get persistent volumes setup, we'll just be careful to not put anything in there we aren't afraid to lose (in the event kubernetes reschedules the pods, without persist storage, everything in the gitlab container will be lost -- we'll setup storage later, for now lets just get to use more of our cluster).

# sh deploy.sh gitlab/gitlab-ee:latest gitlab 80
Enter fullscreen mode Exit fullscreen mode

Once that finishes up after about 5-10 minutes, I'm able to access it at https://gitlab.kube.reaz.io but I'm greeted with a login prompt.

Image description

Here are the steps to track down the initial root password the image creates:

# kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
gitlab-54955459cb-52wr4      1/1     Running   0          12h

# kubectl exec -it gitlab-54955459cb-52wr4 -- bash

# cat /etc/gitlab/initial_root_password
Enter fullscreen mode Exit fullscreen mode

With that in place, we can start to put some of our work into gitlab and eventually start to build some pipelines, just remember anything in there will be destroyed until we setup persistent storage.

Image description

There are lots of different ways to host gitlab or even use gitlab.com, but for this case, I really wanted to say I'm hosting everything in k8s.

The downside I've run into with that though is ssh isn't splittable the way http and ingress is. I.e. you can't just route ssh traffic from gitlab.kube.reaz.io using an ingress controller. Or at least I couldn't find a way with k3's traefik ingress controller for the 15mins I looked. You can get fancy with haproxy. I found a different way.

sshportal

sshportal (https://github.com/moul/sshportal) is in my mind, a ssh gateway. It's a little cumbersome to setup but it accomplishes routing ssh the way we'll want within a kubernetes cluster with replication.

I use the kubernetes cli tools as starting point for my yaml:

# Create a deployment a pod
# kubectl create deployment --image moul/sshportal sshportal

# Create a service
# kubectl expose deployment sshportal --port=2222 --target-port=2222 --name sshportal --type=LoadBalancer
Enter fullscreen mode Exit fullscreen mode

So next, I'm going to rework my api node to replace it's normal sshd server with this new sshportal service we just deployed to k8s.

So these next couple modifications need to go on to live in our ansible playbook for the api-nodes, but here is the run down:

# On the api node, gonna move the sshd port off of 22
vi /etc/ssh/sshd_config
# Change
- Port 22
---
+ Port 2233

# Next we're going to change the k3 server to allow us
# to set a nodePort for ssh
vi /etc/systemd/system/k3s.service
# Change
- ExecStart=/usr/local/bin/k3s \
-    server \
---
+ ExecStart=/usr/local/bin/k3s \
+    server \
+    --kube-apiserver-arg service-node-port-range=10-32767

# Then reload k3s
systemctl daemon-reload
systemctl restart k3s

# Then once k3 restarts,
# we're going to edit the sshportal service
kubectl edit svc sshportal
# And change the nodeport to be 22
- - nodePort: 31840
---
+ - nodePort: 22

# From there you can grab the invite code from the logs
kubectl logs deployment/sshportal
# 2022/11/13 11:21:21 info: system migrated
# 2022/11/13 11:21:21 info 'admin' user created, use the user 'invite:DU6oTsAy8E1vQd9d' to associate a public key with this account
# 2022/11/13 11:21:21 info: SSH Server accepting connections on :2222, idle-timout=0s

# With that you should be able to ssh to the api-node 
# with the invite code and setup the 'admin' account
ssh 10.0.4.1 -l invite:DU6oTsAy8E1vQd9d

# With the admin account, you can start to create ssh jumps
ssh 10.0.4.1 -l admin
#
#    __________ _____           __       __
#   / __/ __/ // / _ \___  ____/ /____ _/ /
#  _\ \_\ \/ _  / ___/ _ \/ __/ __/ _ '/ /
# /___/___/_//_/_/   \___/_/  \__/\_,_/_/
#
#
# config> host ls
#   ID | NAME |           URL            |        KEY        |  GROUPS  |   UPDATED   |   CREATED   | COMMENT | HOP |  # LOGGING
# -----+------+--------------------------+-------------------+---------+-------------+-------------+---------+-----+-------------
#   1 | knox | ssh://knox@10.0.4.1:2233 | nervous_goldstine | default | 3 hours ago | 4 hours ago |         |     | everything
# Total: 1 hosts.

# Then for me, I can use
ssh knox@10.0.4.1
# Which will jump from sshportal to ssh://knox@10.0.4.1:2233
# Then I can add gitlab's ssh://git@gitlab
# And wahla! A single ssh endpoint that I can use for multiple different services.
# Just don't forget to go back and
# setup mysql replication and/or 
# occasionally backup the configs
ssh admin@10.0.4.1 config backup > ~/sshportal/backup.json
ssh admin@10.0.4.1 config restore ~/.sshportal/backup.json --confirm
Enter fullscreen mode Exit fullscreen mode

Persistent Volumes

What good is all this kubernetes setup if nothing persists?

Ideally everything would be ephemeral as possible. But somethings like databases need file storage. So let's set that up.

[ coming next ]

Top comments (0)