Kubernetes Health Checks

#kubernetes #devops

Kubernetes is an open-source container orchestration platform that helps manage and deploy containerized applications. One of the critical features of Kubernetes is its ability to perform health checks on containers. Health checks allow Kubernetes to determine whether a container is running correctly and to take corrective action if necessary.

In this article, we will discuss the importance of health checks in Kubernetes and how they work.

What are Health Checks in Kubernetes?

Health checks are a crucial feature of Kubernetes that help ensure the reliability of containerized applications. They enable Kubernetes to periodically check the health of containers and determine whether they are running correctly. A container may be running, but if it is not functioning as expected, it can cause issues for the application.

Kubernetes provides three types of health checks:
startup probe, liveness probe and readiness probe.

Startup Probe

Why do we need Startup Probe?

When a container starts up, it may take some time for the application inside the container to become fully operational. During this time, the container may be unresponsive or return error codes to incoming requests. This can cause issues for applications that require fast startup time or need to handle a large number of requests.

Startup probes let your containers inform Kubernetes when they’ve started up and are ready to be assessed for liveness and readiness.

The kubelet uses startup probes to know when a container application has started. If such a probe is configured, it disables liveness and readiness checks until it succeeds, making sure those probes don’t interfere with the application startup.

CONFIGURING A STARTUP PROBE:

Startup probe supports the four basic Kubernetes probing mechanism.

Exec: Executes a command within the container. Probe succeeds if command exits with 0 status code.
HTTP: Make HTTP call to a URL within a container. Probe succeeds if container issues HTTP response code 200–399 range.
TCP: Probe succeeds if a specific container port is accepting traffic.
gRPC: Makes a gRPC health checking request to a port inside a container and uses its result to determine whether the probe succeeded.

All these mechanisms share some basic parameters that control the probe’s success criteria:

initialDelaySeconds: Set a delay between the time the container starts and the first time the probe is executed. Defaults to zero seconds.
periodSeconds: Defines how frequently the probe will be executed after the initial delay. Defaults to ten seconds.
timeoutSeconds: Each probe will time out and be marked as failed after this many seconds. Defaults to one second.
failureThreshold: Instructs Kubernetes to retry the probe this many times after a failure is first recorded. The container will only be restarted if the retries also fail. Defaults to three.
Effective configuration of a startup probe relies on these values being set correctly.

Readiness Probe

Why do we need readiness probe?

A readiness probe is used to determine whether a container is ready to accept traffic. Kubernetes sends a request to the container, and if the container responds with a successful status code, it is considered ready. If the container fails to respond or responds with an error code, Kubernetes considers it not ready and stops sending traffic to the container.

Readiness probes are useful in preventing failed requests and ensuring that the application is only accessed when it is ready to serve traffic. If the application is not ready to serve traffic, the readiness probe prevents it from receiving requests thereby preventing users from accessing the application until its ready.

Configuration is same as the startup probe. For ex: Below yaml defines the readiness probe for a container.

`readinessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10`

Liveness Probe

Why do we need Liveness Probe?

A liveness probe is used to determine if a container is still alive. Kubernetes sends a request to the container, and if the container responds with a successful status code, it is considered alive. If the container fails to respond or responds with an error code, Kubernetes considers it as dead and restarts the container.

Liveness probes are useful in detecting application crashes, deadlocks, or other issues that cause the container to stop responding. By restarting the container automatically, Kubernetes ensures that the application remains available and responsive.

Now, the biggest question is what should readiness and liveness probe actually check for?

The readiness probe checks if the application is ready to serve traffic, while the liveness probe checks if the application is still running and should be restarted if it’s not. Here are some guidelines on what these probes should check for:

Readiness Probe:

Verify that the application has successfully started up and initialized all necessary components (e.g., databases, caches).
Check that any required dependencies or connections (e.g., databases, APIs) are available and responsive.
Confirm that any configuration files or environment variables have been loaded correctly.
Validate that the application is able to handle incoming requests and respond appropriately

Liveness Probe:

Check that the application process is running throughout the lifecycle of the pod.
Test that the application is able to perform its intended function (e.g., processing requests, writing to a database) within a certain time limit.
Check for any stuck or deadlocked threads in the application.
Test that the application is not consuming too much memory or CPU resources, which could cause it to crash.

By setting up appropriate readiness and liveness probes, you can detect and handle problems with your Kubernetes applications even before they impact your users.