How are SSL Certificates Rotated in the Platform9 Managed Kubernetes Stack
Problem
Kubernetes components rely on a set of certificates for authentication to communicate with each other. These certificates have a set validity and are renewed.
This article explains how certificates are managed in a PMK cluster.
Environment
- Platform9 Managed Kubernetes - All Versions
Answer
- The pf9-nodelet service on the nodes is the one in charge of cluster bootstrapping. It does so by executing a series of phases which bring up the Kubernetes stack.
- In the very first phase the service reaches out to the Management Plane where a Vault service provides a set of signed certificates which are then stored on the node. These are used by the installed Kubernetes stack.
- Every time the Kubernetes stack restarts, the new set of certificates are requested and mounted. If the stack is not restarted, this will be done when there is a Kubernetes version upgrade since that involves a stack restart.
- It is recommended that cluster upgrades are performed at regular intervals. Along with renewing certificates for the stack this also ensures you get the latest features and security fixes for the cluster.
- If you prefer to rotate certificates at fixed intervals without performing a cluster upgrade you can perform a rolling restart of the stack across nodes.
- Perform the following steps on each node- $ systemctl stop pf9-hostagent
- $ systemctl stop pf9-nodelet
- /opt/pf9/nodelet/nodeletd phases stop- This will tear down the Kubernetes stack on the node. Wait for this command to finish.
- systemctl start pf9-hostagent
- $ /opt/pf9/nodelet/nodeletd phases status- This will tell you if the stack is back up running.
 
Please follow the steps in this exact order. Do not substitute any other nodeletd phases commands.
Restarting the Kubernetes stack on the node will result in the node getting drained and pods being rescheduled. You might notice brief service disruptions depending on how your Application is architected.
On a multi-master cluster, ensure that the steps are performed to one master node at a time else ETCD will lose quorum and the cluster will be unreachable.
Take and maintain regular backups of your cluster data, especially etcd data.
