Vault Token Expired Prematurely Before Validity Period Ends
Problem
- Nodes in a cluster were getting stuck after a reboot at the
Generate certsnodeletd phase.
$ /opt/pf9/nodelet/nodeletd phases status... INDEX NUMBER FILE NAME PHASE STATUS 1 Generate certs / Send signing request to CA failed 2 Prepare configuration 3 Configure Container Runtime 4 Start Container Runtime 5 Network configuration 6 Configure CNI plugin 7 Miscellaneous scripts and checks 8 Configure and start kubelet 9 Configure and start kube-proxy 10 Wait for k8s services and network to be up 11 Apply and validate node taints 12 Apply kubelet configuration 13 Uncordon node 14 Drain all pods (stop only operation) 15 Configure and start monitoringPlatform9 Kubernetes stack is not running- The Certificate Signing Request for the needed certificates is failing due to a permission denied error.
$ cat /tmp/authbs-certs.***/admin/request.json{"errors":["permission denied"]}Environment
- Platform9 Managed Kubernetes - v5.7 and Higher
- Platform9 Self Managed Cloud Platform - v5.9 and Higher
- Vault
Cause
- The Vault Token of the Cluster got expired.
- This is a known issue, and a BUG has been reported with ID PMK-6602 to track and resolve it.
Vault Token: This token is issued to each workload cluster by the pf9-vault service that operates on the management plane. It is utilized by the pf9-nodeletd service running on nodes to request certificates from the Management Plane.
Validation
Steps to validate the token expiry:
- Exec into pf9-vault pod in Management Plane namespace.
$ kubectl exec -it -n <MANAGEMENT_PLANE_NAMESPACE> --kubeconfig <KUBECONFIG> $(kubectl get pods -n $NS -l du-app=pf9-vault -o jsonpath="{.items[0].metadata.name}") -- /bin/bash- Export the required details.
# export VAULT_TOKEN=$(mysql qbert -Bse "SELECT credential_value FROM qbert_secrets where credential_name='root_token'")# export VAULT_ADDR=http://127.0.0.1:8200# CLUSTER_UUID=<CLUSTER_UUID># OLD_VAULT_TOKEN=$(mysql qbert -Bse "SELECT vaultToken FROM clusters WHERE uuid='$CLUSTER_UUID'")# ROOT_VAULT_TOKEN=$(mysql qbert -Bse "SELECT credential_value FROM qbert_secrets where credential_name='root_token'")# CLUSTER_VAULT_TOKEN=$(mysql qbert -Bse "SELECT vaultToken FROM clusters WHERE uuid='$CLUSTER_UUID'")- Run the below command to know token expiry details:
# /usr/local/bin/vault token lookup $CLUSTER_VAULT_TOKENExample:
SAMPLE:# /usr/local/bin/vault token lookup $CLUSTER_VAULT_TOKENKey Value--- -----accessor [ACCESSOR-ID]creation_time [CREATION TIMESTAMP]creation_ttl 26280hdisplay_name tokenentity_id n/aexpire_time [EXPIRY TIMESTAMP]explicit_max_ttl 0sid [ID]issue_time [ISSUE TIMESTAMP]meta <nil>num_uses 0orphan falsepath auth/token/createpolicies [POLICIES]renewable truettl 26215h49m50stype serviceWorkaround
To fix this issue, renew the Vault Token for the problematic cluster and update all hosts with the new Token.
Kindly ensure to document each step as it is executed. This will help maintain a clear and comprehensive record of the process.
- For PMK (SaaS), the platform9 support team will apply the steps below. Please open a Support Ticket.
- For SMCP (air-gapped), perform the steps below from the management plane cluster.
Step 1: Exec into pf9-vault pod in Management Plane namespace.
$ kubectl exec -it -n <MANAGEMENT_PLANE_NAMESPACE> --kubeconfig <KUBECONFIG> $(kubectl get pods -n $NS -l du-app=pf9-vault -o jsonpath="{.items[0].metadata.name}") -- /bin/bashStep 2: Export the required details.
# CLUSTER_UUID=<CLUSTER_UUID># OLD_VAULT_TOKEN=$(mysql qbert -Bse "SELECT vaultToken FROM clusters WHERE uuid='$CLUSTER_UUID'")# ROOT_VAULT_TOKEN=$(mysql qbert -Bse "SELECT credential_value FROM qbert_secrets where redential_name='root_token'")# echo $OLD_VAULT_TOKEN# echo $ROOT_VAULT_TOKENStep 3: Generate New Token.
# NEW_TOKEN_RESP=$(curl -X POST -H "X-Vault-Token: $ROOT_VAULT_TOKEN" --data '{"policies": ["'$CLUSTER_UUID'"], "ttl": "26280h"}' http://localhost:8200/v1/auth/token/create)# NEW_TOKEN=$(echo $NEW_TOKEN_RESP | jq -r '.auth.client_token')# echo "New Vault-Token generated - $NEW_TOKEN"Step 4: Update the new token in qbert Database and exit from pf9-vault pod.
# mysql qbert -e "UPDATE clusters SET vaultToken='$NEW_TOKEN' WHERE uuid='$CLUSTER_UUID'"# exitStep 5: Verify if the new token is updated at the cluster and node levels.
$ kubectl -n <MANAGEMENT_PLANE_NAMESPACE> exec -it --kubeconfig <KUBECONFIG> deploy/mysqld-exporter -- mysql qbert -e "select name,uuid,vaultToken from clusters where uuid='<Cluster_UUID>'"$ kubectl -n <MANAGEMENT_PLANE_NAMESPACE> exec -it --kubeconfig <KUBECONFIG> deploy/sunpike-kube-apiserver -c sunpike-kube-apiserver -- kubectl get hosts <Host ID> -o yaml | grep -i vaultStep 6: If the token in Sunpike does not match the token in Qbert, execute the following command to patch the Sunpike host object.
$ export NEW_TOKEN=<NEW_TOKEN_GENERATED_IN_STEP_3>$ export CLUSTER_UUID=<CLUSTER_UUID>$ for i in $(kubectl -n <MANAGEMENT_PLANE_NAMESPACE> exec -it --kubeconfig <KUBECONFIG> deploy/sunpike-kube-apiserver -c sunpike-kube-apiserver -- kubectl get hosts --no-headers | grep $CLUSTER_UUID | awk '{print $1}'); do kubectl -n <MANAGEMENT_PLANE_NAMESPACE> exec -it --kubeconfig <KUBECONFIG> deploy/sunpike-kube-apiserver -c sunpike-kube-apiserver -- kubectl patch host $i -p '{"spec":{"pf9":{"vaultToken":"'${NEW_TOKEN}'"}}}'; doneStep 7: Perform full stack restart on nodes that got stuck at the Cert Generation phase (if any).
$ systemctl stop pf9-{hostagent, nodeletd}$ /opt/pf9/nodelet/nodeletd phases stop$ systemctl start pf9-hostagentStep 8: Revoke the old Token only if all nodes are working fine. (Optional)
$ kubectl exec -it -n <MANAGEMENT_PLANE_NAMESPACE> --kubeconfig <KUBECONFIG> $(kubectl get pods -n <MANAGEMENT_PLANE_NAMESPACE> -l du-app=pf9-vault -o jsonpath="{.items[0].metadata.name}") -- /bin/bash# ROOT_VAULT_TOKEN=$(mysql qbert -Bse "SELECT credential_value FROM qbert_secrets where credential_name='root_token'")# OLD_VAULT_TOKEN=<OLD_VAULT_TOKEN_FROM_STEP_2># curl -X POST -H "X-Vault-Token: $ROOT_VAULT_TOKEN" --data '{"token": "'$OLD_VAULT_TOKEN'"}' http://localhost:8200/v1/auth/token/revokeAdditional Information
- An internal BUG PMK-6602 has been filed to track this issue. For more details, kindly reach out to the Platform9 Support Team mentioning in the BUG ID.