Hostagent Certificate Rotation Failing due to Comms Connection Failures

Problem

  • Empty host agent certificates generated causing node to be disconnected with the management plane.
  • pf9-comms tunnels are broken due to missing host agent certificates.
  • Nodelet TLS certificates generation impacted due to missing host agent certificates.

Environment

  • Platform9 Managed Kubernetes - v5.6

Cause

  • The probable root cause was pinned down to the vouch returning empty data set while certificates were being requested for hostagent, causing /etc/pf9/certs/hostagent/cert.pem to be empty. This also impacted pf9-comms.service as comms uses the hostagent certificates to create the tunnels to communicate with management plane services. Further, nodelet uses the tunnel to talk to the vouch service to sign the PMK related certificates. In this case, since the hostagent certificates were missing, the comms tunnel was broken, and that ended up breaking the nodelet certificate generation as well.
  • There could be a comms or nginx issue in reaching to vouch.

Resolution

  • This issue is being actively tracked in CORE-1303, CORE-1304 and will be fixed in PMK 5.10 release. Further reach out to Platform9 support to retrieve latest updates on the filed issues.
  • As a workaround, utilize the backup certificate & key pair generated by the name cert.pem.0 & key.pem.0 in /etc/pf9/certs/hostagent/ directory that can be used to restore the older certificates. This is required to be performed with /etc/pf9/certs/ca certificates as well.
  • After copying the certificates, restart the pf9-comms & pf9-hostagent services over the host.
C
Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard