Vouch-Noauth And Vouch-Keystone Pods Are Not Ready Due To Token Expiry
Problem
The Vouch-Noauth and Vouch-Keystone pods are not in a ready state in both Infra and Workload regions. This situation is preventing the environments from being fully operational and has resulted in the upgrade being stalled.
Environment
- Self-Hosted Private Cloud Director Virtualization - v2025.2 to v2025.6
Cause
- Vouch token stored in consul has expired, and it weren't renewed automatically by the
vouch-renew-tokencronjob. - The issue has been reported as a bug, and the Platform Engineering team tracked it under the ID PCD-1468 and the fix has been released in July release.
Diagnostics
vouch-keystoneandvouch-noauthpods become not ready.
$ kubectl get pods --all-namespaces | grep vouch [INFRA_NS] vouch-keystone-POD 1/2 Running 0 3h[INFRA_NS] vouch-noauth-POD 2/3 Running 0 3h[WORKLOAD_NS] vouch-keystone-POD 1/2 Running 0 3h[WORKLOAD_NS] vouch-noauth-POD 2/3 Running 0 3h- Perform the cURL Test
Steps:
- Exec in to vouch-keystone pod and get the vault token from keystone.conf
$ kubectl exec -it -n <AFFECTED_NS> <vouch-keystone-POD> -- bash[vouch-keystone-POD>]$ grep vault_token /etc/vouch/vouch-keystone.conf | awk '{ print $2 }'- Run the cURL command after replacing the actual token from above output
$ curl --header "X-Vault-Token: <TOKEN>" "http://decco-vault-active.default.svc.cluster.local:8200/v1/auth/token/lookup-self" -v $ curl --header "X-Vault-Token: <TOKEN>" "http://decco-vault-active.default.svc.cluster.local:8200/v1/auth/token/lookup-self" -vHost decco-vault-active.default.svc.cluster.local:8200 was resolved.......{"errors":["permission denied"]}* Connection #0 to host decco-vault-active.default.svc.cluster.local left intactIf the token has expired, the output will indicate "Permission denied." as shown above.
Resolution
- Upgrade to Self-hosted Private Cloud Director July release and above version.
Workaround
- Manually renew the expired token so that vouch pods can communicate with consul.
Steps:
- Get the
CONSUL_HTTP_TOKENfrom Airctl host [The host with airctl state file is present]
$ grep consulToken ${HOME}/.airctl/state.yaml | cut -d' ' -f2- Exec into
decco-consul-consul-serverpod in default namespace
$ kubectl exec -it decco-consul-consul-server-0 -- sh -n default- Export the COSUL_HTTP_TOKEN from step 1 in
decco-consul-consul-serverpod
[decco-consul-consul-server-0]$ export CONSUL_HTTP_TOKEN="<TOKEN>"The following commands generate a number of outputs that corresponds to the total number of regions present in the environment.
- Retrieve region UUIDs.
[decco-consul-consul-server-0]$ consul kv get -recurse | grep region_uuid- The
<REGION_UUID>serves a crucial role in distinguishing between multiple regions. This unique identifier ensures that each region can be clearly identified and managed effectively within your environment.
region_fqdns/example-infra.platform9.localnet/region_uuid:<REGION_UUID>region_fqdns/example-workload.platform9.localnet/region_uuid:<REGION_UUID>- Retrieve existing tokens
[decco-consul-consul-server-0]$ consul kv get -recurse | grep host_signing_tokencustomers/<CUSTOMER_ID>/regions/REGION_UUID/services/vouch/vault/host_signing_token:hvs.<TOKEN>- Delete the existing Token for the specified affected region(s).
[decco-consul-consul-server-0]$ consul kv delete customers/<CUSTOMER_ID>/regions/REGION_UUID/services/vouch/vault/host_signing_tokenSuccess! Deleted key: customers/<CUSTOMER_ID>/regions/REGION_UUID/services/vouch/vault/host_signing_tokenExit from the decco-consul-consul-server pod
- Manually run the
vouch-renew-tokenJob
Repeat this step for all affected regions by changing the <AFFECTED_NS>
$ kubectl create job --from=cronjob/vouch-renew-token vouch-renew-token-manual -n <AFFECTED_NS>- Check if the
Vouch-keystone and Vouch-noauthback healthy
$ kubectl get pods --all-namespaces | grep vouch- If these steps prove insufficient to resolve the issue, reach out to the Platform9 Support Team for additional assistance.