Vault Token Expired Prematurely Before Validity Period Ends
Problem
- Nodes in a cluster were getting stuck after a reboot at the
Generate certs
nodeletd phase.
$ /opt/pf9/nodelet/nodeletd phases status
...
INDEX NUMBER FILE NAME PHASE STATUS
1 Generate certs / Send signing request to CA failed
2 Prepare configuration
3 Configure Container Runtime
4 Start Container Runtime
5 Network configuration
6 Configure CNI plugin
7 Miscellaneous scripts and checks
8 Configure and start kubelet
9 Configure and start kube-proxy
10 Wait for k8s services and network to be up
11 Apply and validate node taints
12 Apply kubelet configuration
13 Uncordon node
14 Drain all pods (stop only operation)
15 Configure and start monitoring
Platform9 Kubernetes stack is not running
- The Certificate Signing Request for the needed certificates is failing due to a permission denied error.
$ cat /tmp/authbs-certs.***/admin/request.json
{"errors":["permission denied"]}
Environment
- Platform9 Managed Kubernetes - v5.7 and Higher
- Platform9 Self Managed Cloud Platform - v5.9 and Higher
- Vault
Cause
- The Vault Token of the Cluster got expired.
- This is a known issue, and a BUG has been reported with ID PMK-6602 to track and resolve it.
Vault Token: This token is issued to each workload cluster by the pf9-vault
service that operates on the management plane. It is utilized by the pf9-nodeletd
service running on nodes to request certificates from the Management Plane.
Validation
Steps to validate the token expiry:
- Exec into pf9-vault pod in Management Plane namespace.
$ kubectl exec -it -n <MANAGEMENT_PLANE_NAMESPACE> --kubeconfig <KUBECONFIG> $(kubectl get pods -n $NS -l du-app=pf9-vault -o jsonpath="{.items[0].metadata.name}") -- /bin/bash
- Export the required details.
# export VAULT_TOKEN=$(mysql qbert -Bse "SELECT credential_value FROM qbert_secrets where credential_name='root_token'")
# export VAULT_ADDR=http://127.0.0.1:8200
# CLUSTER_UUID=<CLUSTER_UUID>
# OLD_VAULT_TOKEN=$(mysql qbert -Bse "SELECT vaultToken FROM clusters WHERE uuid='$CLUSTER_UUID'")
# ROOT_VAULT_TOKEN=$(mysql qbert -Bse "SELECT credential_value FROM qbert_secrets where credential_name='root_token'")
# CLUSTER_VAULT_TOKEN=$(mysql qbert -Bse "SELECT vaultToken FROM clusters WHERE uuid='$CLUSTER_UUID'")
- Run the below command to know token expiry details:
# /usr/local/bin/vault token lookup $CLUSTER_VAULT_TOKEN
Example:
SAMPLE:
# /usr/local/bin/vault token lookup $CLUSTER_VAULT_TOKEN
Key Value
--- -----
accessor [ACCESSOR-ID]
creation_time [CREATION TIMESTAMP]
creation_ttl 26280h
display_name token
entity_id n/a
expire_time [EXPIRY TIMESTAMP]
explicit_max_ttl 0s
id [ID]
issue_time [ISSUE TIMESTAMP]
meta <nil>
num_uses 0
orphan false
path auth/token/create
policies [POLICIES]
renewable true
ttl 26215h49m50s
type service
Workaround
To fix this issue, renew the Vault Token for the problematic cluster and update all hosts with the new Token.
Kindly ensure to document each step as it is executed. This will help maintain a clear and comprehensive record of the process.
- For PMK (SaaS), the platform9 support team will apply the steps below. Please open a Support Ticket.
- For SMCP (air-gapped), perform the steps below from the management plane cluster.
Step 1: Exec into pf9-vault pod in Management Plane namespace.
$ kubectl exec -it -n <MANAGEMENT_PLANE_NAMESPACE> --kubeconfig <KUBECONFIG> $(kubectl get pods -n $NS -l du-app=pf9-vault -o jsonpath="{.items[0].metadata.name}") -- /bin/bash
Step 2: Export the required details.
# CLUSTER_UUID=<CLUSTER_UUID>
# OLD_VAULT_TOKEN=$(mysql qbert -Bse "SELECT vaultToken FROM clusters WHERE uuid='$CLUSTER_UUID'")
# ROOT_VAULT_TOKEN=$(mysql qbert -Bse "SELECT credential_value FROM qbert_secrets where redential_name='root_token'")
# echo $OLD_VAULT_TOKEN
# echo $ROOT_VAULT_TOKEN
Step 3: Generate New Token.
# NEW_TOKEN_RESP=$(curl -X POST -H "X-Vault-Token: $ROOT_VAULT_TOKEN" --data '{"policies": ["'$CLUSTER_UUID'"], "ttl": "26280h"}' http://localhost:8200/v1/auth/token/create)
# NEW_TOKEN=$(echo $NEW_TOKEN_RESP | jq -r '.auth.client_token')
# echo "New Vault-Token generated - $NEW_TOKEN"
Step 4: Update the new token in qbert
Database and exit from pf9-vault
pod.
# mysql qbert -e "UPDATE clusters SET vaultToken='$NEW_TOKEN' WHERE uuid='$CLUSTER_UUID'"
# exit
Step 5: Verify if the new token is updated at the cluster and node levels.
$ kubectl -n <MANAGEMENT_PLANE_NAMESPACE> exec -it --kubeconfig <KUBECONFIG> deploy/mysqld-exporter -- mysql qbert -e "select name,uuid,vaultToken from clusters where uuid='<Cluster_UUID>'"
$ kubectl -n <MANAGEMENT_PLANE_NAMESPACE> exec -it --kubeconfig <KUBECONFIG> deploy/sunpike-kube-apiserver -c sunpike-kube-apiserver -- kubectl get hosts <Host ID> -o yaml | grep -i vault
Step 6: If the token in Sunpike
does not match the token in Qbert
, execute the following command to patch the Sunpike host object.
$ export NEW_TOKEN=<NEW_TOKEN_GENERATED_IN_STEP_3>
$ export CLUSTER_UUID=<CLUSTER_UUID>
$ for i in $(kubectl -n <MANAGEMENT_PLANE_NAMESPACE> exec -it --kubeconfig <KUBECONFIG> deploy/sunpike-kube-apiserver -c sunpike-kube-apiserver -- kubectl get hosts --no-headers | grep $CLUSTER_UUID | awk '{print $1}'); do kubectl -n <MANAGEMENT_PLANE_NAMESPACE> exec -it --kubeconfig <KUBECONFIG> deploy/sunpike-kube-apiserver -c sunpike-kube-apiserver -- kubectl patch host $i -p '{"spec":{"pf9":{"vaultToken":"'${NEW_TOKEN}'"}}}'; done
Step 7: Perform full stack restart on nodes that got stuck at the Cert Generation phase (if any).
$ systemctl stop pf9-{hostagent, nodeletd}
$ /opt/pf9/nodelet/nodeletd phases stop
$ systemctl start pf9-hostagent
Step 8: Revoke the old Token only if all nodes are working fine. (Optional)
$ kubectl exec -it -n <MANAGEMENT_PLANE_NAMESPACE> --kubeconfig <KUBECONFIG> $(kubectl get pods -n <MANAGEMENT_PLANE_NAMESPACE> -l du-app=pf9-vault -o jsonpath="{.items[0].metadata.name}") -- /bin/bash
# ROOT_VAULT_TOKEN=$(mysql qbert -Bse "SELECT credential_value FROM qbert_secrets where credential_name='root_token'")
# OLD_VAULT_TOKEN=<OLD_VAULT_TOKEN_FROM_STEP_2>
# curl -X POST -H "X-Vault-Token: $ROOT_VAULT_TOKEN" --data '{"token": "'$OLD_VAULT_TOKEN'"}' http://localhost:8200/v1/auth/token/revoke
Additional Information
- An internal BUG PMK-6602 has been filed to track this issue. For more details, kindly reach out to the Platform9 Support Team mentioning in the BUG ID.