Liveness, Readiness, and Startup Probes in Kubernetes: What You Need to Know

#kubernetes #k8s #devops #aws

In Kubernetes, probes are used to check the health and readiness of containers running in a pod. Probes allow Kubernetes to manage the lifecycle of a pod by ensuring that containers are running properly, and if not, take appropriate actions such as restarting the container or removing it from service until it’s ready.

There are three main types of probes in Kubernetes:

1. Liveness Probe
Purpose: Determines if a container is still running and healthy.
If the liveness probe fails, Kubernetes will restart the container. This is useful for detecting situations where the application has hung or is in an unresponsive state.
Example scenario: A web server that is serving traffic but is unable to process new requests due to a deadlock or resource exhaustion.
2. Readiness Probe
Purpose: Determines if the container is ready to accept traffic.
If the readiness probe fails, the container will not receive traffic from the Kubernetes Service (it will be removed from the Service endpoint list until it becomes ready).
This is helpful when a container needs some time to initialize or warm up before it starts serving requests (e.g., loading large datasets, performing migrations).
3. Startup Probe
Purpose: Determines whether a container has started successfully.
If the startup probe fails, Kubernetes will kill the container and try to restart it. It’s useful for containers that need extra time to initialize or are slow to start.
This probe is typically used for applications that require more time than usual for startup before the readiness or liveness checks should be performed

Each of these probes is highly configurable and can be defined using three main options:

1. HTTP GET Probe
2. TCP Socket Probe
3. Exec Probe

Let’s dive into each of these options and how they can be configured.

1. HTTP GET Probe
The HTTP GET Probe sends an HTTP GET request to a specific path on a container’s port to check if the container is healthy. This is the most common type of probe for web-based applications or services.

How It Works:

Kubernetes will make an HTTP request to the specified path and port inside the container.
If the server responds with a 2xx or 3xx HTTP status code, the probe is considered successful.
If the server responds with a non-successful status code (e.g., 4xx, 5xx), the probe is considered failed.

Example :

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10
  timeoutSeconds: 2
  failureThreshold: 3

path: The URL path that the probe will try to access (e.g., /healthz).
port: The port to reach inside the container (e.g., 8080).
initialDelaySeconds: The initial wait time before performing the first probe, in seconds.
periodSeconds: The interval between each probe, in seconds.
timeoutSeconds: How long to wait for a response before timing out, in seconds.
failureThreshold: The number of consecutive failures before marking the container as unhealthy.

When to Use:

Suitable for web servers, APIs, or any container running an HTTP-based service.
Ideal for containers that expose health or readiness endpoints.

2. TCP Socket Probe
The TCP Socket Probe checks whether a TCP socket is open on a given port in the container. This is particularly useful for non-HTTP-based services like databases, message brokers, or custom network services.

How It Works:

Kubernetes attempts to open a TCP connection to the specified port inside the container.
If the connection is successful (i.e., the port is open), the probe passes.
If the connection cannot be established (i.e., the port is closed), the probe fails.

Example :

readinessProbe:
  tcpSocket:
    port: 3306
  initialDelaySeconds: 5
  periodSeconds: 10
  timeoutSeconds: 3
  failureThreshold: 3

port: The TCP port that Kubernetes should attempt to connect to (e.g., 3306 for MySQL).
initialDelaySeconds: The number of seconds to wait after the container starts before performing the first probe.
periodSeconds: How often to check the socket, in seconds.
timeoutSeconds: How long to wait for a response before timing out, in seconds.
failureThreshold: The number of failed probes before the container is marked unhealthy.

When to Use:

Suitable for services that do not use HTTP, like databases (MySQL, PostgreSQL) or message queues (Kafka, Redis).
Useful for containers that expose raw network services without HTTP endpoints.

3. Exec Probe
The Exec Probe runs a command inside the container and checks its exit status to determine if the probe is successful. If the command returns a 0 exit status, the probe passes. If it returns a non-zero exit status, the probe fails.

How It Works:
Kubernetes runs the specified command inside the container.
If the command’s exit code is 0, the probe is successful.
If the exit code is non-zero, the probe fails.

Example:

startupProbe:
  exec:
    command:
      - "cat"
      - "/tmp/healthy"
  initialDelaySeconds: 10
  periodSeconds: 15
  failureThreshold: 3

command: The command to run inside the container, specified as a list of strings (e.g., cat /tmp/healthy).
initialDelaySeconds: The number of seconds to wait after the container starts before performing the first probe.
periodSeconds: How often to run the command, in seconds.
failureThreshold: The number of consecutive failures before marking the container as unhealthy.

When to Use:

Suitable for cases where you need to check internal states or files in the container.
Useful for debugging or complex health checks that require running specific scripts or commands inside the container.

Additional Probe Configuration Options

In addition to defining the probe type (httpGet, tcpSocket, or exec), you can also configure several timing-related parameters to fine-tune how Kubernetes performs the probes:

initialDelaySeconds: The time to wait after the container starts before performing the first probe. This is helpful when your application needs time to initialize.
periodSeconds: The frequency at which Kubernetes will perform the probe. This controls how often the system checks the container's health or readiness.
timeoutSeconds: The amount of time Kubernetes should wait for a probe to respond before considering it a failure.
failureThreshold: The number of consecutive probe failures that Kubernetes will tolerate before considering the container unhealthy.
successThreshold: The number of consecutive successful probes required to consider the container healthy. (Default is 1 for liveness and readiness probes, and typically used for readiness checks.)

DEV Community

Liveness, Readiness, and Startup Probes in Kubernetes: What You Need to Know

Top comments (0)