With Kubernetes large and diverse workloads can be handled.
To keep track of all these processes, monitoring is essential.
Monitoring
To monitor the application you need to collect metrics, like CPU, memory, disk usage and bandwidth on your nodes.
Because Kubernetes is a distributed system, it needs to be monitored and trace cluster-wide.
You can use exterior tools like Prometheus and visualize it with Grafana. But to get started I recommend you to use the Kubernetes dashboard, as it is very easy to set up and you have a default user interface with the most important metrics.
Logging
If you have aggregated logs, you can visualize issues and search the logs for issues.
In Kubernetes the kubelet writes container logs to local files. With the command kubectl logs
you can see this logs.
If you want to perform cluster wide logging, you can use Fluentd to aggregate logs.
Fluentd agents run on each node via a DeamonSet and feed them to an ElasticSearch instance prior to visualization.
Troubleshooting
Errors in the container
If you are not sure where to start, run
kubectl describe your-pod
This will report
- the overall status of the pod: running, pending or an error state
- the container configuration
- the container events
If the pod is already running you can first look at the standard outs of the container. One common issue is that there are not enough resources allocated.
kubectl logs your-pod your-container
You can look for error messages in the logs.
If there are errors inside a container you execute into the shell of the container to see what is going on.
kubectl exec -it yourdeployment -- /bin/sh
Networking issues
This could be the next place, where the issues arise.
So you can go ahead and check the DNS, firewalls and general connectivity.
Security issues
You might want to check your RBAC.
SELinux and AppArmor are also common issues, especially with network-centric applications.
If you don't know where to start, you can disable security for testing, to delimit the issue source. But be sure to reenable security afterwards.
Another reason - not only for security issues - could be an update. You can roll back to find out when the issue was introduced.
Further reading:
Kubernetes dashboard
Prometheus
Fluentd
Troubleshoot a cluster
Troubleshoot applications
Debug Pods
Top comments (0)