Nodes Report Offline After Attaching to New BareOS Cluster

Problem

  • New cluster remain in "pending" state after nodes are attached to it.

  • Nodes in this cluster remain offline as pf9-kube fails to start. The following errors are seen in /var/log/pf9/kube/kube.log.

Waiting for "local_apiserver_running" to evaluate to true ...Waiting for "local_apiserver_running" to evaluate to true ...Waiting for "local_apiserver_running" to evaluate to true ...Waiting for "local_apiserver_running" to evaluate to true ...
  • The following error can be seen in /var/log/pf9/kubelet log.

kubelet[4686]: E0117 976804 4686 kubelet_node_status.go:94] Unable to register node "[node IP address ]" with API server: Post https://127.0.0.1:443/api/v1/nodes: dial tkubelet[4686]: E0117 011489 4686 kubelet.go:2268] node "[node IP address" not foundkubelet[4686]: E0117 111855 4686 kubelet.go:2268] node "<node IP ddress]" not found........kubelet[4686]: E0117 976804 4686 kubelet_node_status.go:94] Unable to register node "[node IP address]" with API server: Post https://127.0.0.1:443/api/v1/nodes: dial tkubelet[4686]: E0117 011489 4686 kubelet.go:2268] node "[node IP address]" not foundkubelet[4686]: E0117 111855 4686 kubelet.go:2268] node "[node IP address]" not found

Environment

  • Platform9 Managed Kubernetes - All Versions

Cause

Nodes used to create the cluster were a part of another cluster previously and were not cleaned up/deauthorized properly after being detached from the cluster.

Resolution

  1. Remove/purge all the pf9 packages from the node and verify, as shown below.

# dpkg --remove --force-remove-reinstreq [pf9-packagename]# dpkg --purge --force-all [pf9-packagename]
  1. Ensure that the following directories have been removed after removal of the packages.

  1. Reboot the node.

  2. Install the pf9 agent on the nodes again.

  3. Authorize the nodes to the Management Plane once the pf9 agent is successfully installed.

  4. Proceed with cluster creation.

Last updated