Determining Root Cause/s for Pod Termination Issues

Problem

  • It is observed that the pods on a specific node start terminating and are scheduled on other nodes.

  • How to determine the causes of these evictions?

Environment

  • Platform9 Edge Cloud - v5.3.0 and Higher.

  • Platform9 Managed Kubernetes - v5.6 and Higher.

  • Self Managed Cloud Platform9- v5.9 and Higher.

Diagnostic Steps

  • Listed are most frequently observed issues/causes with explanation and sample logtraces for pods termination issues. Observe and identify the kubelet logs on the affected node if any below errors are seen and then take actions accordingly.

  • SyncLoop DELETE: Indicates the kubelet received a request to terminate the pod from the API server.

Jan 17 11:05:12 node-01 kubelet[12345]: I0117 11:05:12.123456   12345 kubelet.go:1906] "SyncLoop DELETE" source="api" pod="default/example-pod"
  • Killing Pod/Container: The kubelet starts terminating the pod and sends signals to stop running containers.

Jan 17 11:05:13 node-01 kubelet[12345]: I0117 11:05:13.234567   12345 kuberuntime_manager.go:868] "Killing pod" podName="example-pod" podNamespace="default" podUID="e7b6d3f9-d0c3-4c1b-94ff-823e82a93157"
  • Cleaning Up Volumes: The kubelet unmounts and removes volumes associated with the pod.

Jan 17 11:05:18 node-01 kubelet[12345]: I0117 11:05:18.789012   12345 kubelet_volumes.go:165] "Cleaning up pod volumes" podUID="e7b6d3f9-d0c3-4c1b-94ff-823e82a93157"
  • Evicted Pod: If the termination is due to resource pressure or eviction, logs indicate the reason.

  • Teardown Network: The CNI plugin tears down the pod's network configuration.

  • Readiness/Liveness Probe Failures: Pods failing due to Liveness or Readiness probe failures:

Additional Information

  • It is recommended that before rebooting the node (if done at all) to resolve the issue, all the necessary logs are captured. Specifically a tarball of the directory /var/log/pf9 .

  • Share the /tmp/cluster-dump.tar.gz file generated using below commands:

Last updated