Vouch-Noauth And Vouch-Keystone Pods Are Not Ready Due To Token Expiry

Problem

The Vouch-Noauth and Vouch-Keystone pods are not in a ready state in both Infra and Workload regions. This situation is preventing the environments from being fully operational and has resulted in the upgrade being stalled.

Environment

  • Self-Hosted Private Cloud Director Virtualization - v2025.2 to v2025.6

Cause

  • Vouch token stored in consul has expired, and it weren't renewed automatically by the vouch-renew-token cronjob.

  • The issue has been reported as a bug, and the Platform Engineering team tracked it under the ID PCD-1468 and the fix has been released in July[v2025.7-47] and above release.

Diagnostics

  1. vouch-keystone and vouch-noauth pods become not ready.

Control Plane Node
$ kubectl get pods --all-namespaces | grep vouch                                                                                  
[INFRA_NS]        vouch-keystone-POD     1/2      Running             0  3h
[INFRA_NS]        vouch-noauth-POD       2/3      Running             0  3h

[WORKLOAD_NS]     vouch-keystone-POD     1/2      Running             0  3h
[WORKLOAD_NS]     vouch-noauth-POD       2/3      Running             0  3h

Method 1: cURL Test

1

Exec into the vouch-keystone container of vouch-keystone pod and get the vault token from vouch-keystone.conf

2

Run the cURL command after replacing the actual token from above output

3

If the token has expired, the output will indicate "Permission denied." as shown above.

Method 2: Verify Vault token using Consul

1

Read the token from Consul (use when the vouch-keystone container is in CrashLoopBackOff). This step does not require exec into the failing vouch pod.

Get the Consul ACL token from the airctl state file on the control plane node:

2

Open a shell inside the Consul server pod:

3

Inside the Consul shell, export the ACL token (replace <CONSUL_TOKEN>) and read the host_signing_token:

The output is a string starting with hvs.. Copy it, then type exit to leave the Consul pod.

4

Open a shell inside the Vault pod (replace <VAULT_POD> with the name from kubectl get pods -n default | grep vault):

5

Inside the Vault shell, set the address and token (replace <HOST_SIGNING_TOKEN>) and run the lookup:

If the token has expired, you will see:

Resolution

  • Upgrade to Self-hosted Private Cloud Director [v2025.7-47] or above version.

Workaround

  • Manually renew the expired token so that vouch pods can communicate with consul.

Steps:

  1. Get the CONSUL_HTTP_TOKEN from Airctl host [The host with airctl state file is present.]

  1. Exec into decco-consul-consul-server pod in the default namespace

  1. Export the COSUL_HTTP_TOKEN from step 1 in decco-consul-consul-server pod

The following commands generate several outputs that correspond to the total number of regions present in the environment.

  1. Retrieve region UUIDs.

  • The <REGION_UUID> serves a crucial role in distinguishing between multiple regions. This unique identifier ensures that each region can be clearly identified and managed effectively within your environment.

  1. Retrieve existing tokens

  1. Delete the existing Token for the specified affected region(s).

Exit from the decco-consul-consul-server pod

  1. Manually run the vouch-renew-tokenJob

Repeat this step for all affected regions by changing the <AFFECTED_NS>

  1. Check if theVouch-keystone and Vouch-noauth back healthy

  • If these steps prove insufficient to resolve the issue, reach out to the Platform9 Support Team for additional assistance.

Last updated