Vault Token Expired Prematurely Before Validity Period Ends

Resolve the issue of nodes getting stuck at the "Generate certs" phase after reboot in Platform9 Kubernetes. Learn how to renew the Vault Token, troubleshoot, and update configurations to get your clu

Problem

  • Nodes in a cluster were getting stuck after a reboot at the Generate certs nodeletd phase.

    $ /opt/pf9/nodelet/nodeletd phases status
    ...
      INDEX NUMBER  FILE                                         NAME    PHASE STATUS
      1             Generate certs / Send signing request to CA  failed
      2             Prepare configuration
      3             Configure Container Runtime
      4             Start Container Runtime
      5             Network configuration
      6             Configure CNI plugin
      7             Miscellaneous scripts and checks
      8             Configure and start kubelet
      9             Configure and start kube-proxy
      10            Wait for k8s services and network to be up
      11            Apply and validate node taints
      12            Apply kubelet configuration
      13            Uncordon node
      14            Drain all pods (stop only operation)
      15            Configure and start monitoring
    Platform9 Kubernetes stack is not running
  • The Certificate Signing Request for the needed certificates is failing due to a permission denied error.

    $ cat /tmp/authbs-certs.***/admin/request.json
    {"errors":["permission denied"]}

Environment

  • Platform9 Managed Kubernetes - v5.7 and Higher

  • Platform9 Self Managed Cloud Platform - v5.9 and Higher

  • Vault

Cause

  • The Vault Token of the Cluster got expired.

  • This is a known issue, and a BUG has been reported with ID PMK-6602 to track and resolve it.

circle-info

Info

Vault Token: This token is issued to each workload cluster by the pf9-vault service that operates on the management plane. It is utilized by the pf9-nodeletd service running on nodes to request certificates from the Management Plane.

Validation

circle-exclamation

Steps to validate the token expiry:

  1. Exec into pf9-vault pod in Management Plane namespace.

  2. Export the required details.

  3. Run the below command to know token expiry details:

Resolution

  • Fix will be available in future versions of PCD.

Workaround

To fix this issue, renew the Vault Token for the problematic cluster and update all hosts with the new Token.

circle-exclamation

Step 1: Exec into pf9-vault pod in Management Plane namespace.

Step 2: Export the required details.

Step 3: Generate New Token.

Step 4: Update the new token in qbert Database and exit from pf9-vault pod.

Step 5: Verify if the new token is updated at the cluster and node levels.

Step 6: If the token in Sunpike does not match the token in Qbert, execute the following command to patch the Sunpike host object.

Step 7: Perform full stack restart on nodes that got stuck at the Cert Generation phase (if any).

Step 8: Revoke the old Token only if all nodes are working fine. (Optional)

Last updated