Restore Certificates Manually for On-Prem Deployments [Internal Only]

Problem

On an On-Prem Platform9 Managed Kubernetes deployment, the management plane (DU) may be unavailable at times and a cluster node may need to be rebooted or have the PMK stack restarted on it. The reboot fails at the generating certificates step if the management plane is unavailable.

Environment

  • Platform9 Edge Cloud - v5.3 LTS Patch 10 and Lower

Procedure

Pre-conditions:

  1. The management plane VM is shut down or offline.
  2. The cluster node had been initializing while the above conditions were true and thus has failed to start the PMK stack (nodeletd phases) at the gen_certs step.

Steps:

  1. On the cluster node, stop the pf9-hostagent and pf9-nodeletd services.
Copy
  1. Stop the PMK stack on the cluster node.
Copy
  1. Restore the certificates from backup by using the script provided below (in Additional Information).
  2. Start nodeletd phases on the cluster node.
Copy
  1. Start pf9-hostagent service on the cluster node.
Copy

Additional Information

Refer Zendesk Ticket: 1352090

Restore Certificate Script:

Copy

Note: Existing certificates are backed up on the worker node automatically when the PMK stack is stopped.

Starting 5.3 LTS Patch 11, the ability to skip gen_certs phase when the management plane is offline and the PMK stack is restarted on a node using nodelet phases is added. This resolves the issue when the management plane is offline and the PMK stack is restarted explicitly using nodeletd phases restart, where in on stop action previously the phase gen_certs was called which removed the certificates, and on start action, it failed to fetch them as the management plane was unavailable.

Note - Also, after a node reboot, the pf9-nodeletd service will skip running the stop and start functions of the gen_certs phase to avoid reaching out to the management plane.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard