Oshi Gupta

Posted on Jul 3, 2023

Pod Disruption Budget in Kubernetes

#kubernetes #cloudnative #pods

Maintaining highly available applications in a Kubernetes cluster can be hard, especially when nodes go under maintenance or fail. The application pods get terminated or may reschedule on other nodes, which can cause downtime or data loss.

To make the applications available 24*7 even during system disruptions, a feature was introduced in Kubernetes known as Pod Disruption Budget. But before that, let’s dive into the different disruptions a system can face.

What are Disruptions?

Disruption, in general, means to break a process, and in terms of Kubernetes, pod disruption means to terminate the pod running on a cluster node if a node fails/upgrades or someone destroys it.

There are two types of disruptions:

Involuntary Disruptions
Voluntary Disruptions

Involuntary Disruptions

These disruptions are unavoidable and occur mainly due to hardware or software errors. Some of its examples are:

Hardware failure of the node.
The cluster admin deletes the node accidentally.
Kernel-related problem.
Cloud provider or hypervisor-related failure makes VM disappear.
Node gets disappear due to cluster network partitions.
Not enough resources left on a node.

Voluntary Disruptions

These disruptions occur by the application owner or cluster administrator.

An application owner can do the following:

Deletes the deployment/controller managing the pods.
Updating the deployment pod’s template causes a restart.
Accidentally deleting a pod.

A cluster administrator can do the following:

Draining a node for upgrade and maintenance.
Drain the node to scale down the cluster.
Removing pod from node to schedule some other pod on it.

These are some of the disruptions which take applications down and give users downtime. But let’s see how to deal with these disruptions.

How to Deal with Disruptions?

To deal with involuntary disruptions, one can follow the below solutions:

Make sure your pod requests the resources it needs and not more.
Create enough replicas of your application to make it more available(HA).
Even in HA, use anti-affinity to have pods on all cluster nodes or zone if using multi-zone clusters.

Similarly, to deal with voluntary disruptions mainly caused by cluster admin actions such as draining a node for maintenance and scaling down a cluster. One can use Pod Disruption Budget (PDB) to make applications always available.

Pod Disruption Budget

Pod Disruption Budget (PDB) is an object created by the application owner that defines the minimum number of application replicas that should run during voluntary disruptions (node upgrade/maintenance) to make it highly available.

Let’s understand with an example, Let’s say you are running a deployment with 6 replicas and have created a PDB which should have 4 replicas to run always in case of voluntary disruptions. Then the eviction API will allow the disruption of two pods at a time.

Features of Pod Disruption Budget

Below are the following features of PDB:

The application owner creates the PDBs.
It helps the operations team to manage the cluster while the application is always available.
Provides an interface between cluster admin and application owner to work smoothly.
Eviction API respects it.
It defines the availability requirements.
It works on Deployment, ReplicaSet, ReplicationController, and StatefulSet objects.

Pod Disruption Budget Fields

There are three main fields in PDB:

.spec.selector in PDB denotes the set of pods on which it is applied. It is the same as the application controller's label selectors.
.spec.minAvailable denotes the number of pods that must be available after eviction. It can be an absolute number or percentage.
.spec.maxUnavailable denotes the number of pods that can be unavailable after eviction. It can be an absolute number or percentage.

One can only specify the minAvailable or maxUnavailable field in a single pod disruption budget, not both.

Let’s see in more detail about PDB by creating a deployment and PDB on a multi-node cluster and draining one node.

Example of Pod Disruption Budget

You can deploy a local multi-node cluster with Kind or use managed Kubernetes services. I have used the EKS cluster for this demo.

kubectl get nodes

PDB is a namespaced-scope resource and belongs to api group policy with the v1 version.

kubectl api-resources | grep pdb

Below is the nginx-deployment yaml configuration with 8 replicas and app:nginx as label selector.

# nginx-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deploy
  labels:
    app: nginx
spec:
  replicas: 8
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:alpine
        ports:
        - containerPort: 80

kubectl apply -f nginx-deploy.yaml
kubectl get deploy

Verify pods are scheduled on both nodes

Use Case: 1

Create a PDB for the above deployment with the minAvailable field set to 5 and app:nginx as the label selector.

# pdb1.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: pdb-minavail
spec:
  minAvailable: 5
  selector:
    matchLabels:
      app: nginx

kubectl apply -f pdb1.yaml
kubectl get pdb
kubectl describe pdb pdb-minavail

As we have set minAvailable=5, which means out of 8 replicas, 5 replicas will always run even if any voluntary disruption occurs.

Now, let’s drain the node and see pdb working

kubectl drain <node_name> --ignore-daemonsets --delete-emptydir-data

It tries to evict the pod and retries until all get evicted. Also, my node is now SchedulingDisabled. All the pods running on it are drained.

Let’s check the pod status and verify whether they have been rescheduled.

kubectl get pods -o wide

Here 4 pods were running on each node, and when I drained the node, pods got evicted and started rescheduling. One of the pods got rescheduled on other node, and three were not as insufficient resources left on the node and get into pending state.

But my node draining gets completed successfully as the PDB requirement was fulfilled to run 5 application pods. The eviction API respected PDB in this voluntary disruption.

Now uncordon the node to make the rest of the pods schedulable.

Use Case: 2

Now, increase the existing pdb minAvailable field to 6 and drain the node to see what happens.

# pdb1.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: pdb-minavail
spec:
  minAvailable: 6
  selector:
    matchLabels:
      app: nginx

kubectl apply -f pdb1.yaml
kubectl get pdb

Once again, drain the cluster node.

k drain <node_name> --ignore-daemonsets  --delete-emptydir-data

You will observe that the drain will not be completed, and eviction API will retry to evict pods until it can reschedule on another node and throws an error: cannot evict pod as it would violate the pod’s disruption budget.

But why has this happened? Because the minimum available pods in pdb are 6 and the other node can only schedule 5 pods according to its resource capacity. As mentioned, eviction API gives PDB priority, so to make a minimum of 6 pods available, it will not drain the node and run the pods on it.

Although my node will mark SchedulingDisabled, it's not drained.

Delete the pdb and uncordon the node.

kubectl delete pdb pdb-minavail

Use Case : 3

Now, create another pdb resource with maxUnavailable set to 3.

# pdb2.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: pdb-maxunavail
spec:
  maxUnavailable: 3
  selector:
    matchLabels:
      app: nginx

kubectl apply -f pdb2.yaml
kubectl describe pdb pdb-maxunavail

Follow the same steps of draining a node to check pdb working

k drain <node_name>  --ignore-daemonsets  --delete-emptydir-data

This time draining works completely as the pdb requirement got satisfied.

The rest of the pods remain unschedulable. Uncordon the node now.

Use Case: 4

Reduce the maxUnavailable field to 2, which makes 6 pods run all the time. Now if you drain the node, use case 2 scenario will happen. Node draining will be incomplete, and pods will not be evicted completely by giving weightage to PDB.

Use Case: 5

Now, what if I set maxUnavailable to 0? This is equal to the setting of minAvailable to 100%. It will be ensured that none of your pods will be disrupted when voluntary disruptions occur.

When not to use PDB?

There are certain cases when pdb cannot be used, such as:

It doesn’t work for involuntary disruptions
In voluntary disruptions, it will not work when pods or deployments get deleted.
Two PDBs can not work together on the same resource.
PDBs don’t work on a single pod or replica of deployment.

Summary

PDBs are useful when we want applications always available, even at cluster maintenance and upgrade times.

References

Top comments (2)

Jean-Paul van Houten - Bos • Jul 6 '23

Loved your article! Can't wait for more info on k8s pods

Oshi Gupta • Jul 6 '23

Thank you 😊

DEV Community