I like being able to keep my own GKE Kubernetes cluster for experimenting. But I realized that this Network Load Balancer was way too expensive.
What happens is that GKE has a LoadBalancer controller running (I can't see it) and whenever I have a service with
type: LoadBalancer, it will create a L4 load balancer that balances 80 and 443 traffic across the nodes. Here is a diagram of my current setup:
I use Traefik as a reverse-proxy and terminates the end-user TLS connection. And since Traefik sets the
status.loadBalancer.ingress on the ingress objects, I can also use ExternalDNS for setting the
A records automatically (I use Cloud DNS).
What if I stopped using the LoadBalancer service and use a
NodePort instead? Two reasons against that:
- GKE restricts the ports you can use to 30000 and above. So I would end up with
https://kube.maelvls.dev:30145and I definitely don't want that. I want to use 80 and 443.
- Traefik uses the LoadBalancer service's status in order to set the ingresses'
status.loadBalancer.ingress, which in turn is used by ExternalDNS for setting the
Arecords automatically. I still want to be able to use ExternalDNS.
Thus, my goal is expose Traefik on 80 and 443 and still use the
type: LoadBalancer which, in turn, allows me to use ExternalDNS.
I could use the service's
externalIPs but it isn't supported by Traefik and I would have to write some controller that would get the node internal IP and then set the
externalIPs with these.
Instead, I opted for the
hostPort solution. No restriction on 80 or 443. But I still have to automate a lot. Fortunately, akrobateo does exactly what I need.
Akrobateo acts as an internal LoadBalancer service controller and replaces the gce-ingress-controller that GKE applies to every GKE cluster. It is inspired by K3S' servicelb.
The akrobateo controller reads the LoadBalancer services and updates the
status.loadBalancer.ingress on these services with the node external IPs. Whenever the external IPs change (since they are ephemeral), the controller updates the service; Traefik does it job and finally ExternalDNS updates the DNS records.
I might get some downtime with that, but for my use it's just fine.
That will install the akrobateo controller in the kube-system namespace:
k apply -k https://github.com/kontena/akrobateo/deploy
Since we want TCP 80 and 443 traffic to be able to hit the nodes. Let's set a new firewall rule:
gcloud compute firewall-rules create akrobateo-fw-traefik --allow tcp:80,tcp:443 --source-ranges=0.0.0.0/0
Also, make sure we don't have a L4 load balancer still running. The load balancer won't get removed automatically.
% gcloud compute forwarding-rules list Listed 0 items.
At this point, the LoadBalancer service should contain the IPs of the nodes:
% k -n traefik get services traefik -oyaml apiVersion: v1 kind: Service metadata: name: traefik namespace: traefik spec: clusterIP: 10.27.244.111 externalTrafficPolicy: Cluster ports: - name: http nodePort: 30371 port: 80 protocol: TCP targetPort: http - name: https nodePort: 32748 port: 443 protocol: TCP targetPort: https type: LoadBalancer status: loadBalancer: ingress: - ip: 220.127.116.11 - ip: 18.104.22.168 % k -n traefik get services -owide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) traefik LoadBalancer 10.27.244.111 22.214.171.124,126.96.36.199 80:30371/TCP,443:32748/TCP
Let's check that these are the external IPs of the cluster nodes:
% k get nodes -owide NAME INTERNAL-IP EXTERNAL-IP gke-august-period-234610-worker-0c5c84f5-rq25 10.142.0.61 188.8.131.52 gke-august-period-234610-worker-micro-cf12d79d-klh6 10.142.0.62 184.108.40.206
kube-system/akrobateo controller will create a DeamonSet for every LoadBalancer service it finds. The DeamonSet is created in the same namespace as where the Service is:
% k get ds -A NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE traefik akrobateo-traefik 2 2 2 2 2
This DeamonSet runs one pod per node, each pod being responsible for proxying everything coming to the host's 80 and 443 ports to Traefik (see description of the network flow below).
Let's make sure ExternalDNS uses the correct IPs:
k -n external-dns logs -l app.kubernetes.io/name=external-dns -f time="2020-01-18T17:26:31Z" level=debug msg="Endpoints generated from ingress: traefik/traefik-dashboard: [traefik.kube.maelvls.dev 0 IN A 220.127.116.11  traefik.kube.maelvls.dev 0 IN A 18.104.22.168 ]"
There still is a slight issue here: we would expect ExternalDNS to set a
A record with both IPs (22.214.171.124 and 126.96.36.199). But for some reason, it doesn't. As we have seen previously, the Akrobateo controller has successfully updated the traefik/traefik service (type LoadBalancer). And since Traefik is the ingress controller responsible for setting the
status.loadBalancer.ingress on ingresses, Traefik is probably the culprit and might simply pick the first element.
What about security, isn't
hostPorta bad practice? I don't know enough yet and cannot really tell.
- What if I have many more nodes, can it scale? I don't know.
Reliability? If the nodes that are advertised in the
ADNS records go down, no more ingress traffic possible.
What if the node's external-ip changes? Since these IPs are ephemeral, the
controller will pick up this change and change the
status.loadBalancer.ingressfield of my
Since I created a firewall rule for 80 and 443, GCE forwards the traffic from 188.8.131.52 (the node's external-ip) to 10.142.0.62 (the node's internal-ip). And since we use
hostPort, kubelet creates some iptable rules that redirect traffic to the pod running on that node:
% gcloud compute ssh gke-august-period-234610-worker-micro-cf12d79d-klh6 --command='sudo iptables-save' | egrep "(HOSTPORT|HP)" -A PREROUTING -m comment --comment "kube hostport portals" -m addrtype --dst-type LOCAL -j KUBE-HOSTPORTS -A OUTPUT -m comment --comment "kube hostport portals" -m addrtype --dst-type LOCAL -j KUBE-HOSTPORTS -A PREROUTING -m comment --comment "kube hostport portals" -m addrtype --dst-type LOCAL -j KUBE-HOSTPORTS # If the packet destination is local (localhost or 127.*), continue with KUBE-HOSTPORTS. -A KUBE-HOSTPORTS -p tcp -m comment --comment "akrobateo-traefik-4jh5k_traefik hostport 443" -m tcp --dport 443 -j KUBE-HP-A7HWACIJNU4N3R5W # If it is a TCP packet and the destication port is 443, continue with KUBE-HP-A7HWACIJNU4N3R5W -A KUBE-HP-A7HWACIJNU4N3R5W -s 10.24.4.2/32 -m comment --comment "akrobateo-traefik-4jh5k_traefik hostport 443" -j KUBE-MARK-MASQ # If the source IP of the packet is the akrobateo proxy pod, this packet is egressing: continue with KUBE-MARK-MASQ (masquarade) -A KUBE-HP-A7HWACIJNU4N3R5W -p tcp -m comment --comment "akrobateo-traefik-4jh5k_traefik hostport 443" -m tcp -j DNAT --to-destination 10.24.4.2:443 # If the source wasnt this pod, then we assume it is some ingress. We replace the destination with the pods ClusterIP.
Then, when it is in the pod, the pod's iptables will kick in and redirect. Nothing is actually running in the pod, except for the namespaced network stack. These pods have:
- an init container that runs
sysctl -w net.ipv4.ip_forward=1(see k8s-networking-gce) in privileged mode, which enables forwarding inside the pod (does not affect the host).
- two containers set iptables and then
pause. In order to alter its namespaced TCP stack, we give the containers the NET_ADMIN capability (these rules only affect the pod, not the host).
sudo iptables -t nat -I PREROUTING ! -s 10.27.244.111/32 -p TCP --dport 80 -j DNAT --to 10.27.244.111:80 sudo iptables -t nat -I POSTROUTING -d 10.27.244.111/32 -p TCP -j MASQUERADE sudo iptables -t nat -I PREROUTING ! -s 10.27.244.111/32 -p TCP --dport 443 -j DNAT --to 10.27.244.111:443 sudo iptables -t nat -I POSTROUTING -d 10.27.244.111/32 -p TCP -j MASQUERADE
Here is a diagram that shows the whole picture:
- Akrobateo is a controller that watches Services of
- When it finds one, creates a deamonset of empty pods
- The task of these empty pods is to set one iptable rule that redirects
traffic to the ClusterIP of the
- Akrobateo also sets the Service's status with the loadBalancer IP.
- Benefit: since the loadBalancer IP is properly set, ExternalDNS works.