Search

Determining Root Cause/s for Pod Termination Issues

Problem

It is observed that the pods on a specific node start terminating and are scheduled on other nodes.
How to determine the causes of these evictions?

Environment

Platform9 Edge Cloud - v5.3.0 and Higher.
Platform9 Managed Kubernetes - v5.6 and Higher.
Self Managed Cloud Platform9- v5.9 and Higher.

Diagnostic Steps

Listed are most frequently observed issues/causes with explanation and sample logtraces for pods termination issues. Observe and identify the kubelet logs on the affected node if any below errors are seen and then take actions accordingly.
SyncLoop DELETE: Indicates the kubelet received a request to terminate the pod from the API server.

Kubelet log
    
Jan 17 11:05:12 node-01 kubelet[12345]: I0117 11:05:12.123456   12345 kubelet.go:1906] "SyncLoop DELETE" source="api" pod="default/example-pod"
Copy

Killing Pod/Container: The kubelet starts terminating the pod and sends signals to stop running containers.

Kubelet log
    
Jan 17 11:05:13 node-01 kubelet[12345]: I0117 11:05:13.234567   12345 kuberuntime_manager.go:868] "Killing pod" podName="example-pod" podNamespace="default" podUID="e7b6d3f9-d0c3-4c1b-94ff-823e82a93157"
Copy

Cleaning Up Volumes: The kubelet unmounts and removes volumes associated with the pod.

Kubelet log
    
Jan 17 11:05:18 node-01 kubelet[12345]: I0117 11:05:18.789012   12345 kubelet_volumes.go:165] "Cleaning up pod volumes" podUID="e7b6d3f9-d0c3-4c1b-94ff-823e82a93157"
Copy

Evicted Pod: If the termination is due to resource pressure or eviction, logs indicate the reason.

Kubelet log
    
Jan 17 11:05:19 node-01 kubelet[12345]: I0117 11:05:19.890123   12345 eviction_manager.go:211] "Pod has been evicted" podName="example-pod" podNamespace="default"
Copy

Teardown Network: The CNI plugin tears down the pod's network configuration.

Kubelet log
    
Jan 17 11:05:17 node-01 kubelet[12345]: I0117 11:05:17.678901   12345 cni.go:333] "Teardown network for pod" podName="example-pod" podNamespace="default"
Copy

Readiness/Liveness Probe Failures: Pods failing due to Liveness or Readiness probe failures:

Kubelet log
    
I1202 10:51:45.445830   12357 prober.go:117] Liveness probe for "application--d2r96_kube-system(36f0e9b8-6d67-4875-8abd-e4e7175f45a0):app-node" failed (failure):
Copy

Additional Information

It is recommended that before rebooting the node (if done at all) to resolve the issue, all the necessary logs are captured. Specifically a tarball of the directory /var/log/pf9 .
Share the /tmp/cluster-dump.tar.gz file generated using below commands:

Master node
    
 
# kubectl cluster-info dump -n <affected_namespace> -o yaml --output-directory=/tmp/cluster/cluster-dump# tar -czvf /tmp/cluster-dump.tar.gz /tmp/cluster/cluster-dump
Copy

Last updated on

Was this page helpful?

On This Page

Determining Root Cause/s for Pod Termination Issues Problem Environment Diagnostic Steps Additional Information