Readers asked to write "something about Pods", closer to the surface of the sea, and simpler. Well, ok, enjoy, I tried.
There are a few common incidents that can occur in a Kubernetes deployment or service. Let's discuss how to respond to them. We proceed from the fact that our knowledge base and "toolbox" are not huge.
Uncover the cause of the crash and take corrective action. You can use the
kubectl get pods command to get information about the crashed pod.
CrashLoopBackOff is a common error indicating a pod constantly crashing in an endless loop. This error can be caused by a variety of issues, including:
- Insufficient resources: lack of resources prevents the container from loading
- Locked file: a file was already locked by another container
- Locked database: the database is being used and locked by other pods
- Failed reference: reference to scripts or binaries that are not present on the container
- Setup error: an issue with the init container setup in Kubernetes
- Config loading error: a server cannot load the configuration file (check your YAMLs twice!)
- Misconfigurations: a general file system misconfiguration
- Connection issues: DNS or kube-dns is not able to connect to a third-party service
- Deploying failed services: an attempt to deploy services/applications that have already failed (e.g. due to a lack of access to other services)
There are a few unobvious ways to manually troubleshoot the
To look at the relevant logs, use this command:
$ kubectl logs [podname] -p
-p tells the software to retrieve the logs of the previous failed instance, which will let you see what's happening at the application level. For instance, an important file may already be locked by a different container because it's in use.
If the deployment logs can't pinpoint the problem, try looking at logs from preceding instances. You can run this command to look at previous Pod logs:
$ kubectl logs -n --previous
You can run this command to retrieve the last 20 lines of the preceding Pod.
$ kubectl logs --previous --tail20
Look through the log to see why the Pod is constantly starting and crashing.
If the logs don't tell you anything, you should try looking for errors in the space, where Kubernetes saves all the events that happened before your Pod crashed. You can run this command:
$ kubectl get events --sort-by=.metadata.creationTimestamp
--namespace [mynamespace] as needed. You will then be able to see what caused the crash.
You may be able to find errors that you can't find otherwise by running this command:
kubectl describe pod [name]
If you get "Back-off restarting failed container", this means your container suddenly terminated after Kubernetes started it.
Often, this is the result of resource overload caused by increased activity. Kubernetes provides liveness probes to detect and remedy such situations. As such, you need to manage resources for containers and specify the right limits for containers. You should also consider changing
initialDelaySeconds so the software has more time to respond.
Finally, you may be experiencing
CrashLoopBackOff errors due to insufficient memory resources. You can increase the memory limit by changing the
resources:limits in the Container's resource manifest:
apiVersion: v1 kind: Pod metadata: name: memory-demo namespace: mem-example spec: containers: - name: memory-demo-ctr image: polinux/stress resources: requests: memory: "100Mi" limits: memory: "200Mi" command: ["stress"] args: ["--vm", "1", "--vm-bytes", "150M", "--vm-hang", "1"]
We're limiting the containerized
stress-ng tool by Przemyslaw Ozgo here. What an irony!🙃
Take action to relieve the pressure. You can use the console tools, metrics or Lens GUI to get information about the CPU and memory usage of the cluster. See my article about resource management.
Investigate the cause of the outage and take corrective action. You can use the
kubectl get svc command to get information about the unavailable service.
A common problem with a malfunctioning service is that of missing or mismatched endpoints. For example, it’s important to ensure that a service connects to all the appropriate Pods by matching the Pod’s
containerPort label with the service’s
targetPort selector. Some other troubleshooting practices for services include:
- Verifying that the service works by DNS name
- Verifying that it works by IP Address
- Ensuring that
kube-proxyis functioning as intended
You may want to restart your pods. Some possible reasons are:
- Resource use isn’t stated or when the software behaves in an unforeseen way. Check your resource limits or auto-scaling rules.
- A pod is stuck in a terminating state
- Mistaken deployments
- Requesting persistent volumes that are not available
Determine the cause of the problem and take corrective action. There are at least four methods how to restart pods.
As you may see,
kubectl command will help you a lot. Consider that you have it instead of insulating tape!