Duplicate Node Entry in Kubernetes After Node Rejoins with Same Hostname
Problem
In Kubernetes clusters, when a node goes offline (like due to hardware problems or OS reinstallation) and later tries to rejoin the cluster, it shows up as a new node even if it has the same hostname. This leads to two node entries appearing in kubectl get nodes
: the old one (NotReady) and the new one (Ready).
$ kubectl get nodes -o wide | grep <NODE_NAME>
[NODE1] Ready worker 89m ...
[NODE1.EXAMPLE.COM] NotReady,SchedulingDisabled worker 142d ...
Environment
- Kubernetes - v1.20 and Higher
- Component: Node Lifecycle / Kubelet / Certificate Authentication.
Cause
When a node is reinstalled or cleaned, the local kubelet state is deleted, including its client certificates.
- After a restart, the kubelet:
- Creates a new certificate
- Tries to re-register with the Kubernetes API server.
- After a restart, the kubelet:
However, Kubernetes links client certificates to node identity. If the API server still recognises the old node, it will not accept the new certificate with the same name.
As a result:
- If the old node entry still exists in the cluster, the kubelet's registration with a new certificate is not accepted with the same name.
This usually leads to the following outcomes:
- A new node object is registered, which may have a slight change in its Hostname.
- The original node remains marked as
NotReady
.
Even if the hostname format differs even slightly, Kubernetes sees it as a completely new node.
Resolution
- If a node is down or offline because of hardware issues or reinstallation, and all its pods have been migrated to other nodes, go ahead and delete the node from the Kubernetes cluster.
$ kubectl delete node <NODE_NAME>
- If the node is master node, remove old etcd member.
# Get etcd members list
$ ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member list
# Delete the stale etcd member
$ ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member remove <MEMBER_ID>
- After recovery, the node will rejoin the cluster with the same hostname as before, as long as the hostname has not changed.
Additional Information
If the deletion of the stale node entry was not completed before bringing up the recovered node, thereby resulting in duplicate entries from the same host, please follow the workaround outlined below.