Kubernetes has disrupted traditional deployment methods and has become very popular. Although it is a great platform to deploy to, it brings complexity and challenges as well. Kubernetes manages nodes and workloads seamlessly, and one of the great features of this containerized deployment platform is that of self-healing. For self-healing on the container level, we need health checks called probes in Kubernetes unless we depend on exit codes.
Liveness probes check if the pod is healthy, and if the pod is deemed unhealthy, it will trigger a restart; this action is different than the action of Readiness Probes I discussed in my previous post.
Let's look at the components of the probes and dive into how to configure and troubleshoot Liveness Probes.
Probes are health checks that are executed by kubelet.
All probes have five parameters that are crucial to configure.
- initialDelaySeconds: Time to wait after the container starts. (default: 0)
- periodSeconds: Probe execution frequency (default: 10)
- timeoutSeconds: Time to wait for the reply (default: 1)
- successThreshold: Number of successful probe executions to mark the container healthy (default: 1)
- failureThreshold: Number of failed probe executions to mark the container unhealthy (default: 3)
You need to analyze your application's behavior to set these probe parameters.
There are three types of probes:
Exec probe executes a command inside the container without a shell. The command's exit status determines a healthy state - zero is healthy; anything else is unhealthy.
livenessProbe: initialDelaySeconds: 1 periodSeconds: 5 timeoutSeconds: 1 successThreshold: 1 failureThreshold: 1 exec: command: - cat - /etc/nginx/nginx.conf
TCP probe checks if a TCP connection can be opened on the port specified. An open port is deemed a success, a closed port or reset is deemed unsuccessful.
livenessProbe: initialDelaySeconds: 1 periodSeconds: 5 timeoutSeconds: 1 successThreshold: 1 failureThreshold: 1 tcpSocket: host: port: 80
HTTP probe makes an HTTP call, and the status code determines the healthy state, between including 200 and excluding 400 is deemed success. Any status code apart from those mentioned is deemed unhealthy.
Here are HTTP Probes additional parameters to configure.
- host: IP address to connect to (default: pod IP)
- scheme: HTTP scheme (default: HTTP)
- path: HTTP path to call to
- httpHeaders: Any custom headers you want to send.
- port: Connection port.
Tip: If Host header is required, then use httpHeader.
An example of an HTTP probe.
livenessProbe: initialDelaySeconds: 1 periodSeconds: 2 timeoutSeconds: 1 successThreshold: 1 failureThreshold: 1 httpGet: host: scheme: HTTP path: / httpHeaders: - name: Host value: myapplication1.com port: 80 initialDelaySeconds: 5 periodSeconds: 5
Kubelet executes liveness probes to see if the pod needs a restart. For example, let's say we have a microservice written in Go, and this microservice has some bugs on some part of the code, which causes a freeze in runtime. To avoid hitting the bug, we can configure a liveness probe to determine if the microservice is in a frozen state. This way, the microservice container will be restarted and come to a pristine condition.
If your application gracefully exits when encountering such an issue, you won't necessarily need to configure liveness probes, but there can still be bugs you don't know about. The pod will be restarted as per the configured/default restart policy.
Probes only determine the health by the probe answers, and they are not aware of the system dynamics of our microservice/application. If for any reason, probe replies are delayed for more than periodSeconds times failureThreshold microservice/application will be determined unhealthy, and a restart of the pod will be triggered. Hence it is important to configure the parameters per application behavior.
Similar to readiness probes, liveness probes also can create a cascading failure if you misconfigure it. If the health endpoint has external dependencies or any other condition that can prevent an answer to be delivered, it can create a cascading failure; therefore, it is of paramount importance to configure the probe considering this behavior.
Let's assume that our application needs to read a large amount of data into cache once in a while; unresponsiveness at this time also might cause a false positive because the probe might fail. In this case, failure of the liveness probe will restart the container, and most probably, it will enter a continuous cycle of restarts. In such a scenario a Readiness Probe might be more suitable to use, the pod will only be removed from service to execute the maintenance tasks, and once it is ready to take traffic, it can start responding to the probes.
Liveness endpoints on our microservice -that probes will hit- should check absolute minimum requirements that shows the application is running. This way, liveness checks would succeed, and the pod will not be restarted, and we ensure the service traffic flows as it should.
We will deploy Nginx as a sample app. below is the deployment and service configuration.
apiVersion: apps/v1 kind: Deployment metadata: name: k8s-probes labels: app: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx ports: - containerPort: 80 livenessProbe: initialDelaySeconds: 1 periodSeconds: 2 timeoutSeconds: 1 successThreshold: 1 failureThreshold: 1 httpGet: host: scheme: HTTP path: / httpHeaders: - name: Host value: myapplication1.com port: 80
Write this configuration to a file called k8s-probes-deployment.yaml, and apply it with
kubectl apply -f k8s-probes-deployment.yaml command.
apiVersion: v1 kind: Service metadata: labels: app: nginx name: nginx namespace: default spec: ports: - name: nginx-http-port port: 80 selector: app: nginx sessionAffinity: None type: NodePort
Also, write this configuration to a file called k8s-probes-svc.yaml and apply it with this command:
kubectl apply -f k8s-probes-svc.yaml
There is no specific endpoint for the Liveness Probe, and we should use
kubectl describe pods <POD_NAME> command to see events and current status.
kubectl get pods
Here we can see our pod is in a running state, and it is ready to receive traffic.
NAME READY STATUS RESTARTS AGE k8s-probes-7d979f58c-vd2rv 1/1 Running 0 6s
Let's check the applied configuration.
kubectl describe pods k8s-probes-7d979f58c-vd2rv | grep Liveness
Here we can see the parameters we have configured.
Liveness: http-get http://:80/ delay=5s timeout=1s period=5s #success=1 #failure=1
Let's look at the events:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 45s default-scheduler Successfully assigned default/k8s-probes-7d979f58c-vd2rv to k8s-probes Normal Pulling 44s kubelet Pulling image "nginx" Normal Pulled 43s kubelet Successfully pulled image "nginx" in 1.117208685s Normal Created 43s kubelet Created container nginx Normal Started 43s kubelet Started container nginx
As you can see, there is no indication of failure nor success; for success conditions, there will be no event recorded.
Now let's change livenessProbe.httpGet.path to "/do-not-exists," and take a look at the pod status.
kubectl get pods
After changing the path, liveness probes will fail, and the container will be restarted.
NAME READY STATUS RESTARTS AGE k8s-probes-595bcfdf57-428jt 1/1 Running 4 74s
We can see that container has been restarted four times.
Let's look at the events.
... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 53s default-scheduler Successfully assigned default/k8s-probes-595bcfdf57-428jt to k8s-probes Normal Pulled 50s kubelet Successfully pulled image "nginx" in 1.078926208s Normal Pulled 42s kubelet Successfully pulled image "nginx" in 978.826238ms Normal Pulled 32s kubelet Successfully pulled image "nginx" in 971.627126ms Normal Pulling 23s (x4 over 51s) kubelet Pulling image "nginx" Normal Pulled 22s kubelet Successfully pulled image "nginx" in 985.155098ms Normal Created 22s (x4 over 50s) kubelet Created container nginx Normal Started 22s (x4 over 50s) kubelet Started container nginx Warning Unhealthy 13s (x4 over 43s) kubelet Liveness probe failed: HTTP probe failed with statuscode: 404 Normal Killing 13s (x4 over 43s) kubelet Container nginx failed liveness probe, will be restarted Warning BackOff 13s kubelet Back-off restarting failed container
As you can see above, "Liveness probe failed: HTTP probe failed with status code: 404", indicates probe failed with HTTP code 404; the status code will also aid in troubleshooting. Just after that, the kubelet informs us that it will restart the container.
Kubernetes liveness probes are life savers when our application is in an undetermined state; they return the application into a pristine condition by restarting the container. However, it is very important that they need to be configured correctly. Of course, there is no one correct way; it all depends on your application and how you want Kubernetes to act in each particular failure scenario. Set values accordingly and test the values through live case scenarios.
- Kubernetes Start Up Probes - Examples & Common Pitfalls
- Kubernetes Readiness Probes - Examples & Common Pitfalls
- Kubernetes Core Probe Documentation
- Configure Liveness, Readiness and Startup Probes
- Kubernetes Container probes Documentation
- Container Lifecycle Hooks Documentation