The second part of this series explained how Pods work while building a Pod having two containers communicating to each other using FIFO and a shared volume.
In this post we'll learn about self-healing systems and what we can achieve by leveraging Pod management to Kubernetes workload resources so they can manage Pods on our behalf.
Let's say we have a single healthy node and multiple Pods running in it. What if the node is faced with a critical hardware failure, making it unhealthy? Remember: a Kubernetes node is represented by a virtual machine.
Since they have a lifecycle, Pods in an unhealthy node will begin to fail.
A new node is required. But provisioning hardware is a costly operation, it takes time.
Meanwhile, the Pod remains failed in the unhealthy node and the application is suffering a downtime.
Once the new node has joined and is ready to accept new Pods, we can start all the pods manually using
kubectl in the newly healthy node, for instance:
$ kubectl apply -f ./pod.yml
Managing Pods directly is not efficient, it can be a cumbersome task not to mention that our application would face multiple downtimes.
We should build a system which is capable of detecting failures and also restarting components or applications automatically with no human intervention.
We need a self-healing system.
Building a self-healing system is crucial for businesses. Anytime our infrastructure suffer disruption, networking or hardware failure, the system should be capable of "healing itself".
Automation is key. And a potential solution for self-healing comes from Robotics.
In Robotics, we usually create a controller that gets a desired state and, by using some sort of control loop, it continuously check if the current state matches the desired state, trying to come closer as much as possible.
A thermostat works exactly using such a controller pattern: it continuously checks if the current temperature matches the desired one, trying to come closer. Once it gets a match, the controller turns off the equipment and the process is repeated over and over again.
Luckily, Kubernetes brings the controller pattern that solves our problem so that we don't need to manage Pods directly.
We are talking about Kubernetes Controllers.
Kubernetes controllers are control loops that watch the cluster state, then take actions to match the desired state as much as possible.
But how do we make use of controllers? Kubernetes provides several Workload Resources so we can rely on them to manage Pods on our behalf.
Time to explore one of the main workload resources that guarantees self-healing capabilities, the ReplicaSet.
Using a ReplicaSet controller, we can specify a number of identical Pods.
### The kind of the Kubernetes object kind: ReplicaSet apiVersion: apps/v1 metadata: name: nginx spec: ### The number of replicas of nginx Pod ### The controller will manage the Pods on our behalf ### Anytime a Pod goes down, the controller will restart a new one to guarantee that at least 2 nginx Pods are running replicas: 2 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx
After applying the YAML file, we should have a representation of a
replicaset object as follows:
$ kubectl get replicasets NAME DESIRED CURRENT READY AGE nginx 2 2 2 13m
Also, checking the Pods:
$ kubectl get pods NAME READY STATUS RESTARTS AGE nginx-r5kmn 1/1 Running 0 15m nginx-k87fz 1/1 Running 0 15m
Note that each Pod has got a random identifier in the cluster as a suffix
Moreover, we can describe the ReplicaSet in a picture:
In the above picture, it's important to note that the controller may decide to keep each Pod in a different Node. That's exactly the resilience and self-healing capability we want!
Whenever a Node gets unhealthy, we're still keeping a healthy Node, thus our application wouldn't suffer downtime easily.
In case we delete a Pod that was created by a ReplicaSet, the controller will start a new one automatically:
$ kubectl delete pod nginx-r5kmn pod nginx-r5kmn deleted
Checking Pods again:
$ kubectl get pods NAME READY STATUS RESTARTS AGE nginx-k87fz 1/1 Running 0 29m ### The new Pod nginx-mr2rd 1/1 Running 0 28s
But in case we want to delete all Pods of a ReplicaSet, we should delete the
$ kubectl delete replicaset nginx replicaset.apps "nginx" deleted
And the Pods are finally gone:
$ kubectl get pods No resources found in default namespace.
In this post we've seen how network or hardware failures can make an impact on our application, hence the importance of a self-healing system.
On top of that, we learned about Kubernetes controllers and how they solve the self-healing problem, by introducing one of the most important workload resources in Kubernetes: the ReplicaSet.
The upcoming posts will still focus on workload resources, more precisely about how we can perform rollout deployments, define stateful Pods, single-node Pods and Pods that run a single task and then stop (Jobs).