Worker Node "NotReady" Issue

Problem

Troubleshoot issues with the Node in the NotReady state or Cluster NodeGroup stuck in the ScalingUp state.

Environment

Private Cloud Director - v2025.4 and Higher.
Kubernetes Cluster 1.31.2 or Higher.

Procedure

Get the OpenStack VM console logs using the given command. Here check what errors or messages these logs show. E.g. Below log shows that the worker nodes joined the cluster successfully.

Command
    
 
$ openstack console log show <Worker-node-VM-ID>
Copy

OpenStack VM console logs
    
$ openstack console log show <worker-noode-VM-ID>[..] cloud-init[1004]: [...] [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"[..] cloud-init[1004]: [...] [kubelet-start] Starting the kubelet[..] cloud-init[1004]: [...] [patches] Applied patch of type "application/strategic-merge-patch+json" to target "kubeletconfiguration"[..] cloud-init[1004]: [...] [kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s[..] cloud-init[1004]: [...] [kubelet-check] The kubelet is healthy after 505.386495ms[..] cloud-init[1004]: [...] [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap[..] cloud-init[1004]: [...][..] cloud-init[1004]: [...] This node has joined the cluster:[..] cloud-init[1004]: [...] * Certificate signing request was sent to apiserver and a response was received.[..] cloud-init[1004]: [...] * The Kubelet was informed of the new secure connection details.[..] cloud-init[1004]: [...][..] cloud-init[1004]: [...] Run 'kubectl get nodes' on the control-plane to see this node join the cluster.[..] cloud-init[1004]: [...][..] cloud-init[1004]: [...] Cloud-init v. 24.4.1-0ubuntu0~22.04.2 finished at Fri, 23 May 2025 03:14:57 +0000. Datasource DataSourceOpenStackLocal [net,ver=2].  Up 49.81 seconds
Copy

Now run $ kubectl describe node <node-name> and check the Events section to get more information. E.g. In the below case, the node is "NotReady" to join due to kubelet being unable to properly get disk statistics for the filesystem where container images are stored.

Node Events
    
 
Events:  Type     Reason                   Age                From             Message  ----     ------                   ----               ----             -------  Normal   Starting                 17m                kube-proxy  Normal   NodeAllocatableEnforced  18m                kubelet          Updated Node Allocatable limit across pods  Warning  InvalidDiskCapacity      18m                kubelet          invalid capacity 0 on image filesystem
Copy

Try to troubleshoot issues based on the error shown in the above Events. Better to have a look at the most common causes given below.
If these steps prove insufficient to resolve the issue, kindly reach out to the Platform9 Support Team for additional assistance.

Most Common causes:

The image version and the cluster version mismatch. Use $ kubectl get nodes and verify the VERSION column corresponds to your Kubernetes cluster's version.
Try a different image with the current version, or deploy a new cluster with an alternative version.
Resource availability (CPU, memory, storage) on the underlying PCD-V host.
kubelet and container runtime service is down on VM.
port security is disabled on the worker nodes VM network.

Last updated on

Was this page helpful?