Kubernetes Node in NotReady State After Reboot for Containerd Runtime Cluster

Problem

  • A Kubernetes (master or worker) node that has been rebooted (e.g. due to a maintenance activity) is showing as NotReady.
Node Status
Copy
  • A description of the node similarly reports KubeletNotReady due to the CNI plugin being uninitialized.
Node Description
Copy

Environment

  • Platform9 Managed Kubernetes - v5.4 and Higher
  • Kubernetes - All 1.21 versions except v1.21.3-pmk.183
  • Runtime - Containerd

Cause

Due to an upstream issue in containerd, the CNI config is not reloaded when the directory is deleted and recreated during the Platform9 Kubernetes stack initialization.

Resolution

  • This issue is now fixed on the pf9-kube-1.21.3-pmk.183. and above releases.

Workaround

  1. Verify that the CNI configuration directory referenced by containerd is not empty.

For Flannel based clusters the directory should contain the following files:

Javascript
Copy

For Calico based clusters the directory should contain the following files:

Bash
Copy
  1. Restart containerd service on the affected node.

Restarting containerd will not affect restart any running containers

Bash
Copy

The Kubernetes nodes should now all show as Ready .

  1. Verify the status of the node after restarting containerd
Node Status
Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard