Worker Node "NotReady" Issue

Problem

Troubleshoot issues with the Node in the NotReady state or Cluster NodeGroup stuck in the ScalingUp state.

Environment

  • Private Cloud Director - v2025.4 and Higher.
  • Kubernetes Cluster 1.31.2 or Higher.

Procedure

  1. Get the OpenStack VM console logs using the given command. Here check what errors or messages these logs show. E.g. Below log shows that the worker nodes joined the cluster successfully.
Command
Copy
OpenStack VM console logs
Copy
  1. Now run $ kubectl describe node <node-name> and check the Events section to get more information. E.g. In the below case, the node is "NotReady" to join due to kubelet being unable to properly get disk statistics for the filesystem where container images are stored.
Node Events
Copy
  1. Try to troubleshoot issues based on the error shown in the above Events. Better to have a look at the most common causes given below.
  2. If these steps prove insufficient to resolve the issue, kindly reach out to the Platform9 Support Team for additional assistance.

Most Common causes:

  • The image version and the cluster version mismatch. Use $ kubectl get nodes and verify the VERSION column corresponds to your Kubernetes cluster's version.
  • Try a different image with the current version, or deploy a new cluster with an alternative version.
  • Resource availability (CPU, memory, storage) on the underlying PCD-V host.
  • kubelet and container runtime service is down on VM.
  • port security is disabled on the worker nodes VM network.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard