Cluster Nodes in NotReady State as the Kube Certificates Expired

Problem

  • The kube certificates in the directory /etc/pf9/kube.d/certs/ are not automatically renewed.
  • pf9-nodelet.service logs the below errors:
Bash
Copy

Environment

  • Platform9 Edge Cloud - v-5.3.0-2075501 and Higher

Cause

  • This issue only occurs in the below conditions:

    • The parameters default_lease_ttl and max_lease_ttl in file etc/pf9-vault.d/server-config.hcl on the duVM are modified from existing TTL of 26280h to lower values.
    • nodelet phases are restarted on a node/s.
  • This causes the certificates on the node to be renewed with the new default_lease_ttl expiry date and the certificate is not auto-renewed causing it to become invalid post it's expiry.

  • This is known bug already reported internally with ID: AIR-1459

Workaround

  • The workaround to this problem is to restart the nodelet phases with --regen-certs parameter.
  • Perform the commands below in the same sequence.
Bash
Copy
Bash
Copy
Bash
Copy
  • Wait for 5-7 mins for the node to reconcile and observe if the node is Healthy.

Additional Information

  • Reach out to Platform9 Support Team for any additional questions/concerns regarding the bug.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard