Node Unable to Join The Cluster After Reboot While Pf9-vault Pod Is Down in LTS2 Setup.

Problem

In LTS2 workload cluster, after rebooting worker/master node, the nodes are unable to join the cluster. On the affected node, the nodeletd phase is stuck in first phase - Certificate Generation. There are failed pods in the Management Cluster including Sunpike and Pf9-vault pods.

The nodeletd phase is stuck in cert generation since the communication of the node with pf9-vault service in the management plane for the cert generation is failing.

Nodeletd Phases
Copy

Environment

  • Platform9 Edge Cloud - v5.6 and Higher

Answer

  • Observed that the communication between the pf9-vault and the Percona database is failing and the cause is yet to be identified.
  • This obstructs the Sunpike pods from coming up which affects the cert generation phase of Nodelet after reboot and when the node tries to join back the cluster.
  • The above behaviour was from a limitation in the product's architecture. However, this is fixed in SMCP-5.10+ releases. The JIRA under which this change was tracked is AIR-1097. Another JIRA got fixed as a result of correcting this behaviour, AIR-1091.

Diagnosis

Check the status of the PF9-Sunpike pods, if they are in CrashLoopBackOff state.

Failed pods
Copy

Pf9-vault pod log:

pf9-vault pod logs
Copy

Workaround

Restarting pf9-vault pod should re-initiate the connection between the Pf9-vault pod and Percona DB, resulting the sunpike pods to be active, and the cert generation should succeed in the Nodeletd cert generation phase.

Additional Information

After the node reboot and nodeleted phases restart, the node is expected to use the existing Kubernetes certificates on the node, for some reason the certs are missing on this node, which is the reason why node has to reach the pf9-vault service in the management plane for certificate generation.

In an ideal case, the node reboot does not need any management plane communication to join back the cluster.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard