Pods on Master Node Stuck in NodeAffinity Status After Master Node is Rebooted
Problem
After restarting a master node some pods on the master node are stuck in NodeAffinity status.
Environment
- Platform9 Managed Kubernetes - v5.0 and Higher
- Kubelet
Cause
During the initialization of the nodes, the nodes are
temporarily available for scheduling without the necessary label to
match the deployment's node selector. Depending on how long the nodes
are available for scheduling without the necessary node labels for the
deployment, the scheduler will start to spam with cluster with pods in NodeAffinity
status. This spamming stops once the worker nodes are fully initialized and the pods are scheduled successful.
This is a known issue tracked in upstream Jira 92067. The patch for this bug is yet to be included in the Platform9 Managed Kubernetes. We have created an internal Jira to backport this patch, we will update this document once the patch is available in Platform9 Managed Kubernetes.
Resolution
As a workaround, cleanup the pods stuck in NodeAffinity status manually.
By default the pods are scheduled on the node once the node is fully initialized with proper labels.