DEV Community

Cover image for Adventures with K0S in AWS
Lance Nehring
Lance Nehring

Posted on • Updated on

Adventures with K0S in AWS

At the time of this writing (August 2024), K0S is at version v1.30.3. There's a tremendous about of outdated and incorrect information on the Internet (which impacts AI, if you're into asking AmazonQ or ChatGPT questions), so be aware of the date of this article. My goal is to keep it current - we'll see how that goes.

This isn't actually a tutorial - the end state is not desirable and the information is too dense. This is a more of an "engineering notebook" - akin to what my fellow graybeards may recall from engineering school.

My plan was to establish a Kubernetes presence on AWS without the incurring the costs of Amazon's EKS. I wanted a lightweight, but fully functional, K8S installation that I could stand up and tear down to prove out orchestration and deployment of containerized projects that come along.... such as those for a startup company where attention to cloud cost is paramount. I'm certainly not against EKS for those situations where the cost is justified and I have used it heavily in the past.

Picking through the various smaller K8S projects out there, I've settled on K0S since it's supposed to be "The Zero Friction Kubernetes". The features I'm after with this experiment are similar to what I've used with EKS:

  • ability to pull images from ECR
  • use the AWS cloud provider functionality to get those AWS specific things: using tags to annotate subnets, worker nodes, route tables, etc for use by the K0S installation.
  • use the pod identity agent to address pods that require certain privileges within AWS via IAM roles
  • use an ingress controller to manage the provisioning and lifecycle of AWS ELBs - namely NLBs and ALBs. K0S has a tool called "k0sctl" to manage installation, but it requires SSH access to the nodes. I have no other use for SSH and don't need to expand the attack surface, so I won't install it.

Establish a VPC for testing

I won't cover the mechanics of creating the VPC, subnets, Internet gateway, route table, security groups, NACLs, etc. I personally use the IaC tool Terraform whenever possible. There is a learning curve to use something like Terraform (and learning HCL), but the benefits are enormous - especially when you need consistency so you don't waste time chasing ghosts resulting from misconfigured infrastructure.

I'm using a VPC with a class B private CIDR (172.16.0.0/16) in the us-east-1 region, enabled for DNS hostnames and DNS resolution. I created 3 public subnets (each with a 20 bit subnet mask) even though we're only using 1 subnet to start with. The main route table needs a route for 0.0.0.0/0 that goes to the Internet Gateway for the VPC. I didn't create any private subnets in order to reduce the cost and need for any NAT gateways for this experiment.

Create your EC2 instance

Similarly, I won't cover the mechanics of creating an EC2 instance. Terraform comes in really handy here since you may find yourself repeatedly doing terraform apply and terraform destroy as you start and stop your experiments. I'm using a "t3a.large" node to start with using the latest AL2023 AMI - to have enough vCPU, memory, and networking to keep us out of harms way, without costing too much (in case we forget to destroy the instance after testing). Also, I'm not bothering to set up SSH to get a shell on the instance and I'm using AWS System Manager instead.

Establish an IAM role for the EC2 instance to use as its instance profile

We're going to do ourselves a giant favor in terms of security and start using IAM roles immediately. Many articles related to AWS just talk about putting credentials in some "~/.aws/credentials" file. Yes, you can do that, but you immediately create an issue that will fail a security audit, and you're actually making your life harder by having to track and secure those credentials. So don't cheat and use your personal IAM access keys, or Krampus will find you.
You can use Terraform for this as well. Effectively you need an IAM role, I named mine "k0s_instance" and attached these AWS managed policies:

  • AmazonEC2ContainerRegistryReadOnly
  • AmazonEKS_CNI_Policy (for when we experiment with the AWS CNI) You'll also need to create 2 customer managed policies and attach those to the role. The policy permissions information is from the docs here: https://cloud-provider-aws.sigs.k8s.io/prerequisites/#iam-policies I named mine "k0s_control_plane_policy" and "k0s_node_policy".

Install k0s

We're going to start at the smallest cluster - a single node. All of the following installation steps are done in a "root" shell on the EC2 instance. This means the control plane and worker artifacts will be running on the same node. There are side effects with node selection and tolerations that we'll run into, but we'll address that later.
https://docs.k0sproject.io/stable/install/

curl -sSLf https://get.k0s.sh | sudo sh
k0s sysinfo
mkdir -p /etc/k0s
k0s config create > /etc/k0s/k0s.yaml
Enter fullscreen mode Exit fullscreen mode

Get the ECR credential provider binary

This link can help you determine what releases are available: https://github.com/kubernetes/cloud-provider-aws/releases

The AWS credential provider documentation is very light. You get to create a configuration file and then search the K0S docs for how to manipulate the kubelet arguments to use that file. This article is for K3S, but shows that the configuration file can be YAML instead of JSON - something that isn't mentioned in the credential provider docs.

RELEASE=v1.30.3
curl -OL https://storage.googleapis.com/k8s-staging-provider-aws/releases/${RELEASE}/linux/amd64/ecr-credential-provider-linux-amd64
mv ecr-credential-provider-linux-amd64 /etc/k0s/ecr-credential-provider
chmod 0755 /etc/k0s/ecr-credential-provider
cat << EOF > /etc/k0s/custom-credential-providers.yaml
apiVersion: kubelet.config.k8s.io/v1
kind: CredentialProviderConfig
providers:
- name: ecr-credential-provider
  matchImages:
  - "*.dkr.ecr.*.amazonaws.com"
  - "*.dkr.ecr.*.amazonaws.com.cn"
  apiVersion: credentialprovider.kubelet.k8s.io/v1
  defaultCacheDuration: '0'
EOF
Enter fullscreen mode Exit fullscreen mode

Edit the k0s.yaml

Here is where we actually set the kubelet argments so that the ECR credential provider will work. The clues for this were from this Gist. Combining that information with this K0S doc, we discover that it is possible to use "--kubelet-extra-args" on the k0s command line to set the "--extra-args for the kubelet.
Also, there seems like no possible way to get the default K0S CNI of kuberouter to work in AWS. I don't know the root cause - possibly there's CIDR conflicts with what I chose for my VPC CIDR - but it was a simple change to set the "spec.network.provider" value to "calico" in the "/etc/k0s/k0s.yaml" file that we created. Calico worked fine for me without further configuration.
So, for now we're using Calico as the CNI. I feel like I should be able to use the AWS VPC CNI plugin, but that has not yet been successful for me. This may need to be revisited if the AWS Load Balancer Controller requires it.

k0s install controller --single --enable-cloud-provider --kubelet-extra-args="--image-credential-provider-config=/etc/k0s/custom-credential-providers.yaml --image-credential-provider-bin-dir=/etc/k0s" -c /etc/k0s/k0s.yaml
systemctl daemon-reload
k0s start
Enter fullscreen mode Exit fullscreen mode

This will "start" the cluster, which we need to do to get the our kubectl configured that we'll do next. The single node cluster won't truly start yet - and that's ok for now. You'll notice Pods in the pending state.

k0s status
k0s kubectl get pod -A
Enter fullscreen mode Exit fullscreen mode

Install and configure kubectl

Here we grab a version of kubectl that matches our kubernetes version so that we maximize compatibility. We use the k0s command to generate a valid config file and put it in the expected place. Note the this file contains the "keys to the kingdom" as far as the k0s installation is concerned, so treat it appropriately.

k0s version
curl -LO https://dl.k8s.io/release/v1.30.3/bin/linux/amd64/kubectl
install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
mkdir -p ~/.kube
k0s kubeconfig admin > ~/.kube/config
chmod 0600 ~/.kube/config
Enter fullscreen mode Exit fullscreen mode

Install helm and add the stable and cloud-provider-aws repos

We're embracing helm charts for repeatable, stable, versioned installations of everything we can. Install the latest version of helm and setup a few repos that we intend to use.

dnf install -y git
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm repo add stable https://charts.helm.sh/stable
helm repo add aws-cloud-controller-manager https://kubernetes.github.io/cloud-provider-aws
helm repo add eks https://aws.github.io/eks-charts
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
Enter fullscreen mode Exit fullscreen mode

Install the helm chart for the aws-cloud-controller-manager

AWS Tags

The documentation for the AWS Cloud Provider is rather underwhelming. Especially frustrating is the lack of direct information about tagging AWS resources. There's some information here that can help:

Using a kubernetes cluster name of "testcluster", the tags we start with are:

Tags for EC2 instance, VPC, subnets:
Key Value
kubernetes.io/cluster/testcluster owned
Additional tags for subnets:
Key Value
kubernetes.io/role/elb 1
kubernetes.io/role/alb-ingress 1
kubernetes.io/role/internal-elb 1

AWS Cloud Provider

I found it necessary to edit the node selector and tolerations of the daemonset to be able to get the pod scheduled in the case of this single node deployment. I was also unable to get AWS route tables annotated to the point where the aws-cloud-controller-manager would be happy about configuring cloud routes. Not sure what "cloud routes" are supposed to be, but for now, I've disabled that feature. There's more on it here.
We are doing all this in the custom helm values file.

cat << EOF > /etc/k0s/accm-values.yaml
---
args:
  - --v=2
  - --cloud-provider=aws
  - --configure-cloud-routes=false
nodeSelector:
  node-role.kubernetes.io/control-plane: "true"
tolerations:
- key: node.cloudprovider.kubernetes.io/uninitialized
  value: "true"
  effect: NoSchedule
EOF
helm -n kube-system upgrade --install aws-cloud-controller-manager aws-cloud-controller-manager/aws-cloud-controller-manager --values /etc/k0s/accm-values.yaml
Enter fullscreen mode Exit fullscreen mode

Install pod identity agent

Unfortunately, I didn't see a helm chart for the eks-pod-identity-agent hosted on a Helm repo. So we're forced to clone the git repo and install the helm chart from that work area.

cat << EOF > /etc/k0s/epia-values.yaml
---
clusterName: testcluster
env:
  AWS_REGION: us-east-1
EOF
git clone https://github.com/aws/eks-pod-identity-agent.git
cd eks-pod-identity-agent/
helm install eks-pod-identity-agent --namespace kube-system ./charts/eks-pod-identity-agent --values ./charts/eks-pod-identity-agent/values.yaml --values /etc/k0s/epia-values.yaml
Enter fullscreen mode Exit fullscreen mode

Is it working so far?

Kubectl should be happy with the node and the pods. It can take a few minutes for the pods to reach a "Running" state.

kubectl get node -o wide
kubectl get pod -A -o wide
Enter fullscreen mode Exit fullscreen mode

From what I've seen, the worker node (our only node, at this point) must have IP addresses assigned, else the kubectl logs command will fail if you try to inspect logs from the pods/containers. I found that you can find logs in the "/var/log/container" container directory of the EC2 instance that I'm using.

Ingress Controller

Here's where things take another complicated twist. The AWS Cloud Controller Manager contains a legacy AWS load balancer controller capable of managing legacy ELBs and NLBs. The code for the NLB management shows an older and newer API... I'm not sure what the switch is between the two, but it may very well be the "--v=2" argument that was passed to the aws-cloud-controller-manager. Oddly the newer API for NLBs is not capable of configuring for proxy protocol, whereas the docs suggest that it does, and so does the code for the older API.
It appears that this legacy code in the aws-cloud-controller-manager is bascially EOL - you can still use it, but broken things are not getting fixed. The push seems to be with a follow-on project, the AWS Load Balancer Controller. It is absolutely confusing, but I did find an article to explain it better.

Install the Nginx Ingress Controller

https://kubernetes.github.io/ingress-nginx/
The available customization values can be found with this nifty helm command:

helm show values ingress-nginx --repo https://kubernetes.github.io/ingress-nginx
Enter fullscreen mode Exit fullscreen mode

We're creating a special configuration here based on: https://kubernetes.github.io/ingress-nginx/deploy/#network-load-balancer-nlb

The idea is that the Nginx ingress controller is behind an NLB that accepts HTTPS and HTTP traffic (TCP:443 and TCP:80). The HTTPS traffic has the SSL terminated at the NLB using the certificate given the the annotation. That traffic of HTTPS origin, is then fed to the nginx controller as HTTP traffic. The traffic of HTTP origin is sent by the NLB to a "tohttps" port (TCP:2443) at the nginx controller, that merely responds to the client with a code 308 permanent redirect - to force the client to use HTTPS.

Notes:

  • The "$" in the http-snippet are escaped in this heredoc to protect them from the shell.
  • The shell variable CERT_ARN must be set to whatever certificate ARN that you have in your AWS Certificate Manager that you intend to use.
  • Since this annotation using the legacy AWS load balancer controller, only a single certificate ARN can be specified.
  • The "proxy-real-ip-cidr" is set to the CIDR of the VPC I'm using. You can force proxy protocol to work, by uncommenting the comments in the heredoc. The controller will not actually enable proxy protocol on the NLB's target groups, so you'll have to use the AWS console and do that manually. It can work, but it's not solution for production.
CERT_ARN="xxxxxxxx"
cat << EOF > /etc/k0s/nic-values.yaml
---
controller:
  service:
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
      service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
      service.beta.kubernetes.io/aws-load-balancer-type: nlb
      service.beta.kubernetes.io/aws-load-balancer-ssl-cert: ${CERT_ARN}
      service.beta.kubernetes.io/aws-load-balancer-ssl-ports: https
#      service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*'
    targetPorts:
      http: tohttps
      https: http
  config:
#    use-proxy-protocol: "true"
    use-forwarded-headers: "true"
    proxy-real-ip-cidr: "172.16.0.0/16"
    http-snippet: |
      server {
        listen 2443;
        return 308 https://\$host\$request_uri;
      }
  containerPort:
    tohttps: 2443
EOF
helm upgrade -i ingress-nginx ingress-nginx/ingress-nginx --values /etc/k0s/nic-values.yaml -n ingress-nginx --create-namespace
Enter fullscreen mode Exit fullscreen mode

Simple application for testing

Once the nginx ingress controller is running, we can attempt to test. For simplicity, this application is just a yaml manifest instead of a helm chart (there's likely a better way to do this). You can adjust the ingress host to something other than "web.example.com" - to potentially match that SSL cert that you're using - or not, depending on whether your testing can handle SSL name mismatch errors.

"simple-web-server-with-ingress.yaml":

apiVersion: v1
kind: Namespace
metadata:
  name: web
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-server
  namespace: web
spec:
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: httpd
        image: httpd:2.4.53-alpine
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: web-server-service
  namespace: web
spec:
  selector:
    app: web
  ports:
    - protocol: TCP
      port: 5000
      targetPort: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-server-ingress
  namespace: web
spec:
  ingressClassName: nginx
  rules:
  - host: web.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-server-service
            port:
              number: 5000
Enter fullscreen mode Exit fullscreen mode

Apply with kubectl apply -f simple-web-server-with-ingress.yaml. It will take a few minutes for the NLB to finish provisioning and pass initial health checks. You can monitor the progress in the AWS EC2 console. The ingress and service can be seen with:

kubectl get ingress -A -o wide
kubectl get service -A -o wide
Enter fullscreen mode Exit fullscreen mode

Now you can attempt to curl to the NLB's DNS address (same as the address shown by kubectl). I'm giving curl the "-k" option to ignore the SSL cert mismatch, and I'm also setting a "Host" HTTP header, since the ingress is explicitly for "web.example.com".

So, when I execute:

curl -k -H 'Host: web.example.com' https://xxxxxxxxxxxxxxxxxxxx.elb.us-east-1.amazonaws.com
Enter fullscreen mode Exit fullscreen mode

I get the expected:

<html><body><h1>It works!</h1></body></html>
Enter fullscreen mode Exit fullscreen mode

If I try to similarly curl using HTTP instead of HTTPS, as:

curl -k -H 'Host: web.example.com' http://xxxxxxxxxxxxxxxxxxxx.elb.us-east-1.amazonaws.com
Enter fullscreen mode Exit fullscreen mode

Then I get the expected 308 permanent redirect:

<html>
<head><title>308 Permanent Redirect</title></head>
<body>
<center><h1>308 Permanent Redirect</h1></center>
<hr><center>nginx</center>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Note that since I didn't change "web.example.com" to some DNS that I own, that if I tell curl to follow the redirect as:

curl -L -k -H 'Host: web.example.com' http://xxxxxxxxxxxxxxxxxxxx.elb.us-east-1.amazonaws.com
Enter fullscreen mode Exit fullscreen mode

I get the expected error:

curl: (6) Could not resolve host: web.example.com
Enter fullscreen mode Exit fullscreen mode

Conclusions at this point

We've shown that it is possible to get K0S working on a single node in AWS. Using the nginx ingress controller can work for an NLB, but there are issues that make it undesirable:

  • It's using a legacy AWS load balancer controller contained in the AWS cloud manager controller, which means:
    • Risk of that code being removed at some unknown point in the future
    • Current documentation doesn't match the actual features
    • NLBs are not configurable for TLS SNI or proxy protocol
    • No support for ALBs
  • The nginx ingress controller has a somewhat complicated configuration to accomplish a common HTTP to HTTPS redirect.

Moving forward

In the current AWS Load Balancer Controller docs we find this:

Additional requirements for non-EKS clusters:

  • Ensure subnets are tagged appropriately for auto-discovery to work
  • For IP targets, pods must have IPs from the VPC subnets. You can configure the amazon-vpc-cni-k8s plugin for this purpose.

I'm going to revisit using the amazon-vpc-cni-k8s plugin. I'm thinking that I missed the kubelet configuration requirements when experimenting before and didn't actually have it installed properly. It appears that they may be components that require installation directly on the worker node - like with the ECR credential provider. We'll see - every day is a learning experience.

Has anyone else tried to use K0S in this way? or have advice/clarifications/questions that I may (or may not) be able to answer?

Top comments (0)