Justin

Posted on Nov 13, 2022

🥇 The ultimate kubernetes homelab setup

Overview

Want an easy way to deploy and control kubernetes in a totally sealed homelab environment?

Let's get started with k3, gitlab, proxmox, ansible and terraform to automate our homelab infrastructure for repeatability and expandability.

Sections

Some of the sections we're going to cover for build out:

proxmox
terraform
ansible
k3
- dns
- test deploys
gitlab
sshportal
persistant volumes

Proxmox

The baseline for this project is going to be a proxmox server. We could run k3 bare metal, but proxmox gives us the ability to practice spinning up and down k8 nodes of different sizes/restraints via terraform. Plus if anything goes wrong, we just destroy a proxmox vm and start fresh, which happens in minutes rather then having to reinstall an entire OS on baremetal.

We're going to need/want to

Setup a cloud-init image to be our base image for terraform
Network a remote cluster via tailscale (vpn) [optional redundancy]

So with proxmox, there's not much to it other then to follow the installation instructions: https://www.proxmox.com/en/proxmox-ve/get-started

Download ISO image
Boot from USB or CD/DVD
Configure the host machine (make sure you can login web ui)

Once you have it so you can log into a web ui with something like



https://[your_ip_here]:8006/

You can decide if you want to setup the remote cluster, one day I'll document how to network the two with tailscale.

Cloud init

With proxmox setup, we need setup a base image to be our "golden image" for terraform to use to deploy new nodes etc.

For that, I'd suggest following a tutorial like this:
https://www.youtube.com/watch?v=shiIi38cJe4

https://docs.technotim.live/posts/cloud-init-cloud-image/

I followed the tutorial pretty closely and even used the suggested image id of 8000. So you'll see that being used in the next terraform section.

Cloud init gives us the ability to spin up a base image with predefined username/password, accepted ssh keys, and ip configuration. As well as gives us the ability to override those settings using the terraform provider.

So with all that, let's jump into the terraform section.

Terraform

So now that our local has it's own "cloud", i.e. we have proxmox instead of AWS or GCP for our "unlimited" VMs, we need a way to orchestrate them.

Part of the big switch to kubernetes is, it enables us to not care about the underlying infrastructure. For example, on AWS, if we have a kubernetes node that runs out of diskspace, we can expect aws to launch a new node and kubernetes move the current workload to the new node.

Terraform is what will enable us to recreate some of that underlying node scalability via proxmox.

Ultimately we'll have something like var.node_count = 3, let's say we want to deploy more pods but we've reached the max kubernetes will deploy to our current 3 cluster setup. We can change to var.node_count = 5 and let terraform handle rolling that out across proxmox and ensure everything is kept uniform.

Terraform gives the ability to easily create and manage a grouping of different size nodes for experimenting with changing workloads or setting certain node affinity on kubernetes.

Some of the groups I've made so far are like the following:



variable "prxmx_api_nodes" {
  default = 1
}

variable "prxmx_worker_nodes" {
  default = 3
}

variable "prxmx_worker_xl_nodes" {
  default = 1
}

variable "prxmx_worker_xxl_nodes" {
  default = 1
}
...

That enables the following types of workgroups that scale to certain resources:

name	cores	ram	hd space	hd type
worker_nodes	4c	4G	30G	hdd
worker_xl_nodes	12c	12G	80G	hdd
worker_ssd_xl_nodes	12c	12G	80G	ssd
worker_xxl_nodes	24c	24G	160G	hdd
worker_ssd_xxl_nodes	24c	24G	160G	ssd

In my setup, I have "xeon" 2 socket, 32 core proxmox server, 96gb ram, and 12 core intel i7, 16gb ram "river" server.

Terraform has to know to deploy these different VMs across both river and xeon proxmox servers, but once those nodes get added to kubernetes (k8s), the implementation is abstracted from k8s. K8s will deploy across any/all available nodes.

But having a good selection of different types of nodes really helps it feel like a full replacement for AWS. And even if you don't use all the different workgroups, having them available but scaled back definitely helps for organization and scalability once it's needed.

Most of my workload might run on worker_xl_nodes which that workgroup can exist across both my river and xeon servers, so if I need to scale down my xeon server, kubernetes can migrate everything to the river server for maintenance.

But some applications need lots of cores and ram that river doesn't have, so without having another large xeon like server, scaling down my worker_xxl_nodes is going to result in downtime.

Ansible

One of the other things we'll also do here is use ansible to setup each of our nodes to our liking via provisioner "local-exec"



resource "proxmox_vm_qemu" "api_nodes" {
  count       = var.prxmx_api_nodes
  name        = "k3-api-${count.index + 1}"
  desc        = "k3-api-${count.index + 1}"
  target_node = "xeon"

  clone = "ubuntu-cloud"

  os_type      = "cloud-init"
  ipconfig0    = "ip=10.0.4.${count.index + 1}/16,gw=10.0.2.1"
  nameserver   = "1.1.1.1"
  searchdomain = "1.1.1.1"

  cores        = 4
  memory       = "4096"

  disk {
    storage = var.disk
    type    = "scsi"
    size    = "30G"
  }

  lifecycle {
    ignore_changes = [
      ciuser,
      sshkeys,
      network,
    ]
  }

  provisioner "local-exec" {
    command = "sleep 30 && ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -u knox -i '10.0.4.${count.index + 1},' playbook.yml"
  }
}

What this means is, after each successful setup of our proxmox vm, we'll run a ansible playbook to install the tool to our liking.

Whether that's k3 itself, or neovim and exporting the editor to use neovim on each VM. Ansible enables that.

K3

So with our hypervisor (proxmox) chosen, vm layout designed (terraform + ansible), we're ready to start on the kubernetes cluster, which is the thing that will handle our actual workloads/applications.

For this iteration, I'm using k3 for its total ease of use.

Using terraform, I deploy one api server (high-availability to come later) with terraform something like



# cat main.tf

terraform {
  required_providers {
    proxmox = {
      source  = "Telmate/proxmox"
      version = "2.9.11"
    }
  }
}

provider "proxmox" {
  pm_api_url      = "https://10.0.2.3:8006/api2/json"
  pm_tls_insecure = true
}

variable "prxmx_api_nodes" {
  default = 1
}

variable "disk" {
  default = "hdd-12tb"
}

resource "proxmox_vm_qemu" "api_nodes" {
  count       = var.prxmx_api_nodes
  name        = "k3-api-${count.index + 1}"
  desc        = "k3-api-${count.index + 1}"
  target_node = "xeon"

  clone = "ubuntu-cloud"

  os_type      = "cloud-init"
  ipconfig0    = "ip=10.0.4.${count.index + 1}/16,gw=10.0.2.1"
  nameserver   = "1.1.1.1"
  searchdomain = "1.1.1.1"

  cores        = 4
  memory       = "4096"

  disk {
    storage = var.disk
    type    = "scsi"
    size    = "30G"
  }

  lifecycle {
    ignore_changes = [
      ciuser,
      sshkeys,
      network,
    ]
  }

  provisioner "local-exec" {
    command = "sleep 30 && ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -u knox -i '10.0.4.${count.index + 1},' playbook.yml"
  }
}

And a playbook like



# cat playbook.yml

---

- name: Install k3
  gather_facts: true
  hosts: all
  tasks:
    - name: Install k3
      shell: curl -sfL https://get.k3s.io | sh -

Once it's deployed, in my case it'll deploy it to a static ip 10.0.4.1, so I'll ssh to it, and grab the k3 token to add future nodes to the k3 cluster.



# cat /var/lib/rancher/k3s/server/token
K100.......

So now I spin up some worker_xl_nodes, once they're finished, I ssh in and run



curl -sfL https://get.k3s.io | K3S_URL=https://10.0.4.1:6443 K3S_TOKEN='K100.....' sh -

Once that's finished and I have the workers I need, can verify with



# kubectl get nodes

NAME              STATUS   ROLES                  AGE   VERSION
k3-worker-xxl-1   Ready    <none>                 12h   v1.25.3+k3s1
k3-worker-xl-1    Ready    <none>                 12h   v1.25.3+k3s1
k3-worker-3       Ready    <none>                 14h   v1.25.3+k3s1
k3-worker-2       Ready    <none>                 14h   v1.25.3+k3s1
k3-api-1          Ready    control-plane,master   14h   v1.25.3+k3s1
k3-worker-1       Ready    <none>                 14h   v1.25.3+k3s1

DNS

Now one other thing is, I setup a DNS A record to point



*.kube.reaz.io -> 10.0.4.1

Because I use tailscale for everything, I have no need for exposing the IP to the outside world.

Test deploys

One other thing I'll often use is this little deploy.sh helper script for quickly slinging out an docker image to an entire k8s deployment.



# cat deploy.sh

#!/bin/bash

image=$1
name=$2
port=$3

kubectl create deployment $name --image=$image
kubectl expose deployment $name --port=80 --target-port=$port --name $name --type=LoadBalancer
kubectl create ingress $name --rule="$name.kube.reaz.io/*=$name:80"

The usage is like



## Example: sh deploy.sh [image] [name] [port]

# sh deploy.sh linuxserver/nginx nginx 80

# curl nginx.kube.reaz.io
    <html>
        <head>
            <title>Welcome to our server</title>
...

Gitlab

With that in place, we're ready to deploy a test gitlab env. I like to use gitlab as a self-hosted/self-contained git for my local stuff that I don't want on the public web (mainly for things that aren't serious, just a nice playground).

But I do like keeping things in source control and using pipelines where possible.

So using the deploy script earlier, we'll just use the quick and easy docker image for right now and until we get persistent volumes setup, we'll just be careful to not put anything in there we aren't afraid to lose (in the event kubernetes reschedules the pods, without persist storage, everything in the gitlab container will be lost -- we'll setup storage later, for now lets just get to use more of our cluster).



# sh deploy.sh gitlab/gitlab-ee:latest gitlab 80

Once that finishes up after about 5-10 minutes, I'm able to access it at https://gitlab.kube.reaz.io but I'm greeted with a login prompt.

Here are the steps to track down the initial root password the image creates:



# kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
gitlab-54955459cb-52wr4      1/1     Running   0          12h

# kubectl exec -it gitlab-54955459cb-52wr4 -- bash

# cat /etc/gitlab/initial_root_password

With that in place, we can start to put some of our work into gitlab and eventually start to build some pipelines, just remember anything in there will be destroyed until we setup persistent storage.

There are lots of different ways to host gitlab or even use gitlab.com, but for this case, I really wanted to say I'm hosting everything in k8s.

The downside I've run into with that though is ssh isn't splittable the way http and ingress is. I.e. you can't just route ssh traffic from gitlab.kube.reaz.io using an ingress controller. Or at least I couldn't find a way with k3's traefik ingress controller for the 15mins I looked. You can get fancy with haproxy. I found a different way.

sshportal

sshportal (https://github.com/moul/sshportal) is in my mind, a ssh gateway. It's a little cumbersome to setup but it accomplishes routing ssh the way we'll want within a kubernetes cluster with replication.

I use the kubernetes cli tools as starting point for my yaml:



# Create a deployment a pod
# kubectl create deployment --image moul/sshportal sshportal

# Create a service
# kubectl expose deployment sshportal --port=2222 --target-port=2222 --name sshportal --type=LoadBalancer

So next, I'm going to rework my api node to replace it's normal sshd server with this new sshportal service we just deployed to k8s.

So these next couple modifications need to go on to live in our ansible playbook for the api-nodes, but here is the run down:



# On the api node, gonna move the sshd port off of 22

vi /etc/ssh/sshd_config

# Change


Port 22
---

Port 2233


# Next we're going to change the k3 server to allow us

# to set a nodePort for ssh

vi /etc/systemd/system/k3s.service

# Change



ExecStart=/usr/local/bin/k3s </span>

   server </span>
---


ExecStart=/usr/local/bin/k3s </span>

   server </span>

   --kube-apiserver-arg service-node-port-range=10-32767


# Then reload k3s

systemctl daemon-reload

systemctl restart k3s

# Then once k3 restarts,

# we're going to edit the sshportal service

kubectl edit svc sshportal

# And change the nodeport to be 22


- nodePort: 31840
---

- nodePort: 22


# From there you can grab the invite code from the logs

kubectl logs deployment/sshportal

# 2022/11/13 11:21:21 info: system migrated

# 2022/11/13 11:21:21 info 'admin' user created, use the user 'invite:DU6oTsAy8E1vQd9d' to associate a public key with this account

# 2022/11/13 11:21:21 info: SSH Server accepting connections on :2222, idle-timout=0s

# With that you should be able to ssh to the api-node 

# with the invite code and setup the 'admin' account

ssh 10.0.4.1 -l invite:DU6oTsAy8E1vQd9d

# With the admin account, you can start to create ssh jumps

ssh 10.0.4.1 -l admin

#

#    __________ _____           __       

#   / _/ _/ // / _ _  _/ /_ / /

#  _\ _\ \/ _  / _/ _ \/ _/ / _ '/ /

# //////_/   _//  _/_,//

#

#

# config> host ls

#   ID | NAME |           URL            |        KEY        |  GROUPS  |   UPDATED   |   CREATED   | COMMENT | HOP |  # LOGGING

# -----+------+--------------------------+-------------------+---------+-------------+-------------+---------+-----+-------------

#   1 | knox | ssh://knox@10.0.4.1:2233 | nervous_goldstine | default | 3 hours ago | 4 hours ago |         |     | everything

# Total: 1 hosts.

# Then for me, I can use

ssh knox@10.0.4.1

# Which will jump from sshportal to ssh://knox@10.0.4.1:2233

# Then I can add gitlab's ssh://git@gitlab

# And wahla! A single ssh endpoint that I can use for multiple different services.

# Just don't forget to go back and

# setup mysql replication and/or 

# occasionally backup the configs

ssh admin@10.0.4.1 config backup > ~/sshportal/backup.json

ssh admin@10.0.4.1 config restore ~/.sshportal/backup.json --confirm

Persistent Volumes

What good is all this kubernetes setup if nothing persists?

Ideally everything would be ephemeral as possible. But somethings like databases need file storage. So let's set that up.

[ coming next ]

DEV Community

🥇 The ultimate kubernetes homelab setup

Overview

Sections

Proxmox

Cloud init

Terraform

Ansible

K3

DNS

Test deploys

Gitlab

sshportal

Persistent Volumes

Top comments (0)

Read next

AI Benchmark Scores Drop 19% When Questions Are Reworded to Prevent Pattern Exploitation

5yrs of building, show hn, and earning the first $1

Understanding Differential Privacy in Data Analysis

New AI Training Method Uses Dual Captions to Create Better AI-Generated Images