A pod can be evicted for multiple reasons.
- when a node is drained of all it's pods,
- via kubectl delete,
- scheduler evicts pods to allow the execution of higher-priority ones...
- Delete request is issued
- API server modifies the state to
- The kubelet and the endpoints-controller start the eviction process:
- Kubelet performs the pod eviction.
- The endpoints-controller handles the endpoint removal process. Both operations are asynchronous.
Kubelet initiates a shutdown sequence for each container in the pod.
- Kubelet runs the container’s
pre-stophook (if it exists)
- sends a SIGTERM to the containers and
- waits for the termination of the containers.
This sequential process by default is set to
30 seconds (or the value in seconds specified in the
If the container is still running beyond this time, kubelet waits for approximately 2 more seconds, and then kills the container forcibly by sending a SIGKILL signal.
In parallel, the endpoints-controller removes the pod’s endpoint by requesting the API server. This server notifies all the kube-proxies on the worker nodes. Each kube-proxy removes the endpoint from the iptables rules in its node.
We cannot make assumptions about which of the eviction processes will complete first. If the endpoint removal process finishes before the containers receive the SIGTERM signal, no new requests will arrive while the containers are terminating. However, if the containers start terminating before the endpoint removal process is finished, the pods will continue to receive requests. In that case, clients will get “Connection timeout” or “Connection refused” errors as responses. Because the endpoint removal must propagate to every node in the cluster before it is complete, there is a high probability that the pod eviction process completes first. LearnK8s created a neat visual representation of these two scenarios.
- Handle SIGTERM properly
- Use a stop-watch timer to count and drain connections and only then shut-down