Duplicate Node Entry in Kubernetes After Node Rejoins with Same Hostname

Problem

In Kubernetes clusters, when a node goes offline (like due to hardware problems or OS reinstallation) and later tries to rejoin the cluster, it shows up as a new node even if it has the same hostname. This leads to two node entries appearing in kubectl get nodes: the old one (NotReady) and the new one (Ready).

kubectl command
Copy

Environment

  • Kubernetes - v1.20 and Higher
  • Component: Node Lifecycle / Kubelet / Certificate Authentication.

Cause

  • When a node is reinstalled or cleaned, the local kubelet state is deleted, including its client certificates.

    • After a restart, the kubelet:
      • Creates a new certificate
      • Tries to re-register with the Kubernetes API server.
  • However, Kubernetes links client certificates to node identity. If the API server still recognises the old node, it will not accept the new certificate with the same name.

    • As a result:

      • If the old node entry still exists in the cluster, the kubelet's registration with a new certificate is not accepted with the same name.
    • This usually leads to the following outcomes:

      • A new node object is registered, which may have a slight change in its Hostname.
      • The original node remains marked as NotReady.

Even if the hostname format differs even slightly, Kubernetes sees it as a completely new node.

Resolution

  • If a node is down or offline because of hardware issues or reinstallation, and all its pods have been migrated to other nodes, go ahead and delete the node from the Kubernetes cluster.
kubectl command
Copy
  • If the node is master node, remove old etcd member.
etcdctl commands
Copy
  • After recovery, the node will rejoin the cluster with the same hostname as before, as long as the hostname has not changed.

Additional Information

If the deletion of the stale node entry was not completed before bringing up the recovered node, thereby resulting in duplicate entries from the same host, please follow the workaround outlined below.

  • Follow the steps outlined in the resolution section. After completing those steps, proceed to Detach the recovered node from the Kubernetes cluster, and subsequently Re-Attach it to the cluster.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard