Worker Node "NotReady" Issue

Troubleshoot Node issues in a Kubernetes Cluster on Private Cloud Director v2025.4+. Resolve NotReady states or scaling problems with step-by-step procedures, including log retrieval and common causes

Problem

Troubleshoot issues with the Node in the NotReady state or Cluster NodeGroup stuck in the ScalingUp state.

Environment

  • Private Cloud Director - v2025.4 and Higher.

  • Kubernetes Cluster 1.31.2 or Higher.

Procedure

  1. Get the OpenStack VM console logs using the given command. Here check what errors or messages these logs show. E.g. Below log shows that the worker nodes joined the cluster successfully.

$ openstack console log show <Worker-node-VM-ID>
$ openstack console log show <worker-noode-VM-ID>
[..] cloud-init[1004]: [...] [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[..] cloud-init[1004]: [...] [kubelet-start] Starting the kubelet
[..] cloud-init[1004]: [...] [patches] Applied patch of type "application/strategic-merge-patch+json" to target "kubeletconfiguration"
[..] cloud-init[1004]: [...] [kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[..] cloud-init[1004]: [...] [kubelet-check] The kubelet is healthy after 505.386495ms
[..] cloud-init[1004]: [...] [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap
[..] cloud-init[1004]: [...]
[..] cloud-init[1004]: [...] This node has joined the cluster:
[..] cloud-init[1004]: [...] * Certificate signing request was sent to apiserver and a response was received.
[..] cloud-init[1004]: [...] * The Kubelet was informed of the new secure connection details.
[..] cloud-init[1004]: [...]
[..] cloud-init[1004]: [...] Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
[..] cloud-init[1004]: [...]
[..] cloud-init[1004]: [...] Cloud-init v. 24.4.1-0ubuntu0~22.04.2 finished at Fri, 23 May 2025 03:14:57 +0000. Datasource DataSourceOpenStackLocal [net,ver=2].  Up 49.81 seconds
  1. Now run $ kubectl describe node <node-name> and check the Events section to get more information. E.g. In the below case, the node is "NotReady" to join due to kubelet being unable to properly get disk statistics for the filesystem where container images are stored.

  1. Try to troubleshoot issues based on the error shown in the above Events. Better to have a look at the most common causes given below.

  2. If these steps prove insufficient to resolve the issue, kindly reach out to the Platform9 Support Teamarrow-up-right for additional assistance.

Most Common causes:

  • The image version and the cluster version mismatch. Use $ kubectl get nodes and verify the VERSION column corresponds to your Kubernetes cluster's version.

  • Try a different image with the current version, or deploy a new cluster with an alternative version.

  • Resource availability (CPU, memory, storage) on the underlying PCD-V host.

  • kubelet and container runtime service is down on VM.

  • port security is disabled on the worker nodes VM network.

Last updated