New Nodes are Unable to Join a PMK Cluster

Problem

  • Node scaleout fails. The node gets added to the management plane and is visible in the UI under cluster but not completely joined to the cluster.

  • Below errors could be seen in the Nodeletd logs from the respective node.

{"L":"INFO","T":"2022-09-23T06:23:25.113Z","C":"phases/phases.go:279","M":"--- /opt/pf9/pf9-kube/none_scripts/10-all-none-scripts.sh status at 2022-09-23 06:23:25 ---"}
{"L":"INFO","T":"2022-09-23T06:23:25.113Z","C":"phases/phases.go:279","M":"[2022-09-23 06:23:25] http proxy: http/s_proxy env vars not defined, no pf9-comms proxy configuration; skipping configuration"}
{"L":"INFO","T":"2022-09-23T06:23:25.113Z","C":"phases/phases.go:279","M":"[2022-09-23 06:23:25] unknown"}
{"L":"INFO","T":"2022-09-23T06:23:25.113Z","C":"phases/phases.go:279","M":"[2022-09-23 06:23:25] [DOCKER-DAEMON-FAIL] docker daemon is not running"}
{"L":"INFO","T":"2022-09-23T06:23:25.113Z","C":"phases/phases.go:279","M":"[2022-09-23 06:23:25] [DOCKER-DAEMON-FAIL] docker daemon is not running"}
{"L":"INFO","T":"2022-09-23T06:23:25.113Z","C":"nodelet/nodelet.go:418","M":"Submitting status update to Sunpike: localhost:8111"}
{"L":"INFO","T":"2022-09-23T06:23:25.156Z","C":"nodelet/nodelet.go:232","M":"pf9-kube is already stopped..."}
{"L":"INFO","T":"2022-09-23T06:23:25.156Z","C":"nodelet/nodelet.go:438","M":"Submitting status update to Sunpike and checking it for new config: localhost:8111"}
{"L":"INFO","T":"2022-09-23T06:23:25.177Z","C":"nodelet/nodelet.go:452","M":"Handling received HostSpec from Sunpike."}
{"L":"INFO","T":"2022-09-23T06:23:25.178Z","C":"nodelet/nodelet.go:475","M":"Received config from Sunpike has not changed compared to the current config."}
{"L":"INFO","T":"2022-09-23T06:23:25.179Z","C":"nodelet/nodelet.go:173","M":"Reconcile completed."}

Environment

  • Platform9 Managed Kubernetes - v4.3 and Higher

Cause

  • This condition could be seen when a cluster upgrade in past was not fully completed and got stuck in the middle due to some reason.

  • The below could be seen in the Qbert database.

Resolution

  • Login to the Management Plane UI and go to the clusters section. Scroll to the right to see if there's an option Continue Upgrade available for the respective cluster.

  • Click on Continue Upgrade option so that the post upgrade is completed and the Qbert gets updated with the desired version.

  • Post this operation is executed successfully, the scaled out node should be added to the cluster automatically and should be in Ready state.

  • If not, restart PMK stack on the affected node using the following steps:

Additional Information

Make sure that all the nodes that are part of the respective cluster are in Connected state for the Continue Upgrade to execute successfully.

Last updated