Node NotReady With Error "container runtime is down, PLEG is not healthy"
Problem
- A Kubernetes node is in a "NotReady" state.
- The Kubelet process on the corresponding node is in a defunct state.
- The node is also exhibiting a high load average (comparative to the # of CPUs), as observed via
top
or via the pf9-muster log.
[host] instances:0 loadavg:173.62 proc_active:16 proc_total:4218
- In the Kubelet log, it is observed that the "container runtime is down, and that PLEG is not healthy".
581159 25606 kubelet_node_status.go:430] Recording NodeNotReady event message for node <IP>I1110 XX:XX:XX.581176 25606 setters.go:518] Node became not ready: {Type:Ready Status:False LastHeartbeatTime:YYYY-YY-YY XX:XX:XX.581137659 +0530 IST m=+4777925.197815516 LastTransitionTime:YYYY-YY-YY XX:XX:XX.581137659 +0530 IST m=+4777925.197815516 Reason:KubeletNotReady Message:container runtime is down,PLEG is not healthy: pleg was last seen active 21m34.729124335s ago; threshold is 3m0s}
- In the Docker log, a "broken pipe" error is observed.
<host> dockerd[20998]: time="YYYY-YY-YYTXX:XX:XX.510007829+05:30" level=error msg="Handler for GET /v1.31/containers/json returned error: write unix /var/run/docker.sock->@: write: broken pipe"
Environment
- Platform9 Managed Kubernetes - v3.6.0 and Higher
- Docker
Cause
In every iteration, the PLEG health check calls docker ps
to detect container states changes and docker inspect
to get the details of those containers. After finishing each iteration, it updates a timestamp. If the timestamp hasn't been updated for 3 minutes, the health check fails.
In most common occurrences of such issues, PLEG could not finish doing all the tasks in 3 minutes in turn causing Docker socket connection errors which likely result in the pf9-kubelet service to eventually enter a defunct state.
In scenarios where the Cluster Node is flapping between Ready/NotReady state due to PLEG issues accounted due to high load average, the pf9-nodelet service does continuously monitors the health of the pf9-kubelet service via phase Configure and start kubelet. If there is an issue with the service, pf9-nodelet service will try and perform a restart of that phase. But from a long term perspective, investigating the cause of high load average would be beneficial from preventing the Node to flap.
Resolution
The defunct process cannot be cleared aside from rebooting the node.
- Reboot the node.
Additional Information
Use the below script to identify the time taken to inspect containers:
# TIMEFORMAT=%R; time (/opt/pf9/pf9-kube/bin/crictl -r unix:///run/containerd/containerd.sock ps | grep -v POD | awk '{print $1, $7}') | while read id name; do echo -e "\nChecking Container: $name : $id"; RESP=$(time /opt/pf9/pf9-kube/bin/crictl -r unix:///run/containerd/containerd.sock inspect $id 2>&1 > /dev/null); echo -e "Took$RESP above secs for $name ID: $id \n"; done; echo -e "Total Time"
# TIMEFORMAT=%R; time docker ps --format "{{.ID}}\t{{.Names}}" | while read id name; do echo -e "\nChecking Container: $name : $id"; RESP=$(time docker inspect $id 2>&1 > /dev/null); echo -e "Took$RESP above secs for $name ID: $id \n"; done; echo -e "Total Time"