Introduction
Managing Pods distribution across a cluster is hard. Pod affinity and anti-affinity feature of Kubernetes allows some control of Pod placement. However, these features only resolve part of Pods distribution use cases.
There is a common need to distribute the Pods evenly across the cluster for high availability and efficient cluster resource utilization.
As such, PodTopologySpread scheduling plugin was designed to fill that gap. The plugin has reached a stable state since Kubernetes v1.19.
In this article, I’ll show you an example of using the topology spread constraints feature of Kubernetes to distribute the Pods workload across the cluster nodes in an absolute even manner.
Part 1. Spin Multi-node Kubernetes Cluster
If you already have a Kubernetes cluster with three or more worker nodes, you can skip this cluster setup part.
I’ll be using an awesome tool called kind to spin up a local Kubernetes cluster using Docker containers as “nodes”.
By default, when creating a multi-node cluster via kind, it doesn’t assign a unique hostname for each worker nodes (very unkind 😄)
Firstly, create a directory called hostnames
containing a file for each worker with a unique hostname.
$ mkdir hostnames
$ echo 'worker-1' > hostnames/worker-1
$ echo 'worker-2' > hostnames/worker-2
$ echo 'worker-3' > hostnames/worker-3
Now, save the kind cluster config shown below which creates a K8s cluster consisting of 1 control panel (master) and 3 workers. The config also has mounts defined per worker to set the unique hostname.
$ cat > unkind-config.yaml <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
extraMounts:
- hostPath: hostnames/worker-1
containerPath: /etc/hostname
- role: worker
extraMounts:
- hostPath: hostnames/worker-2
containerPath: /etc/hostname
- role: worker
extraMounts:
- hostPath: hostnames/worker-3
containerPath: /etc/hostname
EOF
Finally, spin up the Kubernetes cluster as such:
$ kind create cluster --config unkind-config.yaml
The output should be similar to shown below:
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.21.1) 🖼
✓ Preparing nodes 📦 📦 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Thanks for using kind! 😊
Now, verify the cluster is up and running:
$ kubectl get nodes
The output should be similar to shown below:
NAME STATUS ROLES AGE VERSION
kind-control-plane Ready control-plane,master 3m29s v1.21.1
worker-1 Ready <none> 2m58s v1.21.1
worker-2 Ready <none> 2m58s v1.21.1
worker-3 Ready <none> 2m58s v1.21.1
We’re now ready to play around with the cluster!
Part 2. Distribute Pods Evenly Across The Cluster
The topology spread constraints rely on node labels to identify the topology domain(s) that each worker Node is in.
In order to distribute pods evenly across all cluster worker nodes in an absolute even manner, we can use the well-known node label called kubernetes.io/hostname as a topology domain, which ensures each worker node is in its own topology domain.
In the below manifest, we have defined a deployment with 3 replicas that assigned a label type=dummy
to the Pod and a topologySpreadConstaints that acts on pods that have that label defined.
And spec.topologySpreadConstaints
is defined as:
-
maxSkew: 1
— distribute pods in an absolute even manner -
topologyKey: kubernetes.io/hostname
— use the hostname as topology domain -
whenUnsatisfiable: ScheduleAnyway
— always schedule pods even if it can’t satisfy even distribution of pods -
labelSelector
— only act on Pods that match this selector
Finally, the Pods runs a container image called pause
that does absolutely nothing! 😃
apiVersion: v1
kind: Namespace
metadata:
name: dummy
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: dummy
namespace: dummy
spec:
replicas: 3
selector:
matchLabels:
type: dummy
template:
metadata:
labels:
type: dummy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
type: dummy
containers:
- name: pause
image: k8s.gcr.io/pause:3.1
Now, let’s apply the manifest:
$ kubectl apply -f dummy-deployment.yaml
namespace/dummy created
deployment.apps/dummy created
And verify that the pod's placement is balanced across all worker nodes:
$ kubectl -n dummy get pods -o wode --sort-by=.spec.nodeName
As we can see from the above screenshot, pods are scheduled evenly on worker-1, worker-2, and worker-3 respectively.
We can further upscale the deployment to 30 replicas, and validate the distribution of pods as we scale.
$ kubectl -n dummy scale deploy/dummy --replicas 30
As you can see in the screenshot below, the pods are evenly distributed across all cluster nodes after we upscaled the deployment. #awesomeness
Conclusion
PodTopologySpread scheduling plugin gives power to Kubernetes administrators to achieve high availability of applications as well as efficient utilization of cluster resources.
Known Limitations:
- Scaling down a Deployment will not guarantee and may result in imbalanced Pods distribution. You can use Descheduler to rebalance the Pods distribution.
Top comments (1)
Hi.
Missprint in the:
Correct: