Duplicate Node Entry in Kubernetes After Node Rejoins with Same Hostname

Problem

In Kubernetes clusters, when a node goes offline (like due to hardware problems or OS reinstallation) and later tries to rejoin the cluster, it shows up as a new node even if it has the same hostname. This leads to two node entries appearing in kubectl get nodes: the old one (NotReady) and the new one (Ready).

kubectl command
    
​x
 
$ kubectl get nodes -o wide | grep <NODE_NAME>​[NODE1]                Ready                         worker   89m  ...​[NODE1.EXAMPLE.COM]    NotReady,SchedulingDisabled   worker   142d ...
Copy

Environment

Kubernetes - v1.20 and Higher
Component: Node Lifecycle / Kubelet / Certificate Authentication.

Cause

When a node is reinstalled or cleaned, the local kubelet state is deleted, including its client certificates.
- After a restart, the kubelet:
  - Creates a new certificate
  - Tries to re-register with the Kubernetes API server.
However, Kubernetes links client certificates to node identity. If the API server still recognises the old node, it will not accept the new certificate with the same name.
- As a result:
  - If the old node entry still exists in the cluster, the kubelet's registration with a new certificate is not accepted with the same name.
- This usually leads to the following outcomes:
  - A new node object is registered, which may have a slight change in its Hostname.
  - The original node remains marked as NotReady.

Even if the hostname format differs even slightly, Kubernetes sees it as a completely new node.

Resolution

If a node is down or offline because of hardware issues or reinstallation, and all its pods have been migrated to other nodes, go ahead and delete the node from the Kubernetes cluster.

kubectl command
    
 
$ kubectl delete node <NODE_NAME>
Copy

If the node is master node, remove old etcd member.

etcdctl commands
    
 
# Get etcd members list$ ETCDCTL_API=3 etcdctl \  --endpoints=https://127.0.0.1:2379 \  --cacert=/etc/kubernetes/pki/etcd/ca.crt \  --cert=/etc/kubernetes/pki/etcd/server.crt \  --key=/etc/kubernetes/pki/etcd/server.key \  member list  # Delete the stale etcd member$ ETCDCTL_API=3 etcdctl \  --endpoints=https://127.0.0.1:2379 \  --cacert=/etc/kubernetes/pki/etcd/ca.crt \  --cert=/etc/kubernetes/pki/etcd/server.crt \  --key=/etc/kubernetes/pki/etcd/server.key \  member remove <MEMBER_ID>
Copy

After recovery, the node will rejoin the cluster with the same hostname as before, as long as the hostname has not changed.

Additional Information

If the deletion of the stale node entry was not completed before bringing up the recovered node, thereby resulting in duplicate entries from the same host, please follow the workaround outlined below.

Follow the steps outlined in the resolution section. After completing those steps, proceed to Detach the recovered node from the Kubernetes cluster, and subsequently Re-Attach it to the cluster.

Last updated on

Was this page helpful?