# Vouch-Renew-Token job Fails with Vault Permission Denied – Renewing Expired Vault admin\_token

## Problem

The Vouch authentication services in a PCD environment are stuck in a failing state because the admin\_token used by the automatic renewal process has expired. Without a valid admin\_token, the renewer cannot refresh Vouch's other credentials, and the services cannot recover on their own.&#x20;

The process of manually replacing the expired admin\_token so the automatic renewal can resume.

## Environment

* Self-Hosted Private Cloud Director Virtualization - v2025.7-47 and later&#x20;
* Private Cloud Director Virtualization - v2025.7-47 and later
* Component: Vault

## Cause

Vouch authenticates to Vault using a chain of two tokens:

| Token                       | Purpose                                                                                                                                | Stored in Consul at                                                                              |
| --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------ |
| `host_signing_token` (leaf) | Used by the running vouch service to sign certificates and validate authentication requests at runtime.                                | `customers/<CUSTOMER_ID>/regions/<REGION_UUID>/services/vouch/vault/host_signing_token`          |
| `admin_token` (parent)      | Used by the `vouch-renew-token` CronJob to authenticate to Vault and mint a fresh `host_signing_token` when the leaf one nears expiry. | `customers/<CUSTOMER_ID>/vault_servers/dev/admin_token` (and per-region copies — see Workaround) |

When the parent `admin_token` expires, the CronJob's first authenticated call to Vault - the request to create the host signing policy - is rejected with a *permission-denied* error. The renewer can no longer mint a new `host_signing_token`. The leaf token then ages out, and the vouch pods enter `CrashLoopBackOff` because every authenticated call upstream is rejected with 403.

The auto-renewal protection delivered by KB [Vouch-Noauth And Vouch-Keystone Pods Are Not Ready Due To Token Expiry](/kb/pcd/self-hosted/vouch-noauth-and-vouch-keystone-pods-are-not-ready-due-to-token-expiry.md) only covers the leaf `host_signing_token`. Rotation/generation of the parent `admin_token` is not currently covered, which is what causes this chained-expiry condition.

## Diagnostics

### Pre-requisites

{% hint style="info" %}
For SaaS customer, reach out to Platform9 Support.&#x20;

For Self Hosted PCD customers, follow the steps below:
{% endhint %}

* Shell access to a PCD control plane node, with `kubectl` and `airctl` already configured.
* Get **CUSTOMER\_ID** of the deployment by running the command below from the control plane node:

  <pre class="language-bash" data-title="Control Plane Node"><code class="lang-bash">$ kubectl get pod -A -l du-app=vouch-keystone \
    -o jsonpath='{.items[0].spec.containers[?(@.name=="vouch-keystone")].env[?(@.name=="CUSTOMER_ID")].value}{"\n"}'
  </code></pre>
* Get **REGION\_UUID** for each affected region using:

  <pre class="language-bash" data-title="Control Plane Node"><code class="lang-bash">$ kubectl get pod -n &#x3C;REGION_NS> -l du-app=vouch-keystone \
    -o jsonpath='{.items[0].spec.containers[?(@.name=="vouch-keystone")].env[?(@.name=="REGION_ID")].value}{"\n"}'
  </code></pre>
* To list the REGION\_UUID of every affected region in one go:

  <pre class="language-bash" data-title="Control Plane Node" data-overflow="wrap"><code class="lang-bash">$ kubectl get pods -A -l du-app=vouch-keystone \
    -o jsonpath='{range .items[*]}{.metadata.namespace}{"  "}{.spec.containers[?(@.name=="vouch-keystone")].env[?(@.name=="REGION_ID")].value}{"\n"}{end}' 
  </code></pre>

The first column is the **REGION\_NS** (region namespace), the second is the **REGION\_UUID**. Keep these values handy to substitute them into the commands below wherever `<CUSTOMER_ID>`, `<REGION_UUID>`, or `<REGION_NS>` are referred

Run the steps below in order. Each one is a separate check; together they confirm the issue is the chained admin\_token expiry.

{% stepper %}
{% step %}

### Check the overall region health

{% code title="Control Plane Node" %}

```bash
$ sudo airctl status
```

{% endcode %}

If this issue applies, you will see `region health: ⚠️ Not Ready` and the ready-services count below the desired count (for example `28/30`, `83/85`):

{% code title="Sample Output" %}

```bash
------------- deployment details ---------------
fqdn:                [REGION_FQDN]
region:              [REGION_FQDN]
deployment status:   ready
region health:       ⚠️ Not Ready
version:              PCD <version>
-------- region service status ----------
desired services:     30
ready services:       28
```

{% endcode %}
{% endstep %}

{% step %}

### Check the state of the vouch pods

List the vouch pods across all namespaces:

{% code title="Control Plane Node" %}

```bash
$ kubectl get pods --all-namespaces | grep vouch
```

{% endcode %}

The `vouch-keystone` and `vouch-noauth` pods will be stuck with a partial Ready ratio (`1/2` and `2/3`), in `CrashLoopBackOff`, with a high `RESTARTS` count:

{% code title="Sample Output" %}

```bash
NAMESPACE     NAME                                READY   STATUS             RESTARTS         AGE
[REGION_NS]   vouch-keystone-5594456b98-n5v2q     1/2     CrashLoopBackOff   6960 (19s ago)   45d
[REGION_NS]   vouch-noauth-5d557b8d76-vhsh4       2/3     CrashLoopBackOff   6908 (9s ago)    45d
```

{% endcode %}
{% endstep %}

{% step %}

### Check the status of the vouch-renew-token job

List the renewal jobs in the affected region's namespace:

{% code title="Control Plane Node" %}

```bash
$ kubectl get jobs -n <REGION_NS> | grep vouch-renew-token
```

{% endcode %}

If the auto-renewer ran recently (or was manually triggered as part of the standard `host_signing_token` recovery procedure), you will see one or more rows with status `Failed 0/1`:

{% code title="Sample Output" %}

```bash
NAME                                  STATUS    COMPLETIONS   DURATION   AGE
vouch-renew-token-29623018            Failed    0/1           42m        65m
vouch-renew-token-manual              Failed    0/1           38m        57m
```

{% endcode %}
{% endstep %}

{% step %}

### Review the renewal job logs for the Vault 403 error

Find the failed renewal job's pod:

{% code title="Control Plane Node" %}

```bash
$ kubectl get pods -n <REGION_NS> | grep vouch-renew-token
```

{% endcode %}

View the pod's logs (replace `<RENEWER_POD_NAME>` with the pod name from the previous command):

{% code title="Control Plane Node" %}

```bash
$ kubectl logs -n <REGION_NS> <RENEWER_POD_NAME>
```

{% endcode %}

The log will end with Vault rejecting a policy-creation request:

{% code title="Sample Output" %}

```py
"PUT /v1/sys/policy/hosts-<region-id> HTTP/1.1" 403 33
Traceback (most recent call last):
  ...
  File "/usr/local/lib/python3.9/site-packages/vaultlib/ca.py", line 261, in create_vouch_token_policy
    return self.rq('PUT', 'sys/policy/%s' % policy_name, body)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url:
   http://decco-vault-active.default.svc.cluster.local:8200/v1/sys/policy/hosts-<region-id>
```

{% endcode %}

If all four checks above match, the vouch services cannot recover on their own because the credential the renewer needs to perform the recovery has itself expired. Continue with the additional diagnostic steps below to confirm the admin\_token has expired before applying the workaround.
{% endstep %}

{% step %}

### Confirm the renewer is failing with the policy-creation 403

Find the renewer pod:

{% code title="Control Plane Node" overflow="wrap" %}

```bash
$ kubectl get pods -n <REGION_NS> | grep vouch-renew-token
```

{% endcode %}

View the pod logs

{% code title="Control Plane Node" overflow="wrap" %}

```bash
$ kubectl logs -n <REGION_NS> <RENEWER_POD_NAME>
```

{% endcode %}

If the log shows `403 Client Error: Forbidden` against the Vault `sys/policy` endpoint, proceed to Step 7 - Validate the admin\_token against Vault.
{% endstep %}

{% step %}

### Read the admin\_token from Consul

The renewer reads its admin\_token from a Consul key-value store. To inspect that value requires Consul ACL token (which is required to read Consul). It is stored in the `airctl` state file on the control plane node:

{% code title="Control Plane Node" overflow="wrap" %}

```bash
$ CONSUL_TOKEN=$(grep consulToken ${HOME}/.airctl/state.yaml | cut -d' ' -f2)
```

{% endcode %}

Expected: a non-zero length is printed (typical Consul tokens are 36 characters). If length is 0, the file path is wrong or the key isn't present - check `${HOME}/.airctl/state.yaml` manually before continuing.

Now read the `admin_token` from the Consul server pod and using the `consul kv get` command

{% code title="Control Plane Node" overflow="wrap" %}

```bash
$ kubectl exec -it -n default decco-consul-consul-server-0 -- sh
$ export CONSUL_HTTP_TOKEN="<CONSUL_TOKEN>"
$ consul kv get customers/<CUSTOMER_ID>/vault_servers/dev/admin_token
```

{% endcode %}

The output is a single `hvs.[..]` string - that's the admin\_token. Type exit to leave the Consul pod when done.
{% endstep %}

{% step %}

### Validate the admin\_token against Vault

Login to the vault pod that is in Running state:

{% code title="Control Plane Node" overflow="wrap" %}

```bash
##Get vault pod
$ kubectl get pods -n default | grep vault
##Login to vault pod
$ kubectl exec -it -n default <VAULT_POD> -- sh
```

{% endcode %}

Set the Vault address and token, then look up the token:

{% code title="Vault Pod:" overflow="wrap" %}

```bash
$ export VAULT_ADDR=http://localhost:8200
$ export VAULT_TOKEN="<ADMIN_TOKEN>"
$ vault token lookup
```

{% endcode %}

If `Error looking up token: ... Code: 403. Errors: * permission denied`\
is printed then the admin\_token is **expired or revoked**. Proceed to [#workaround](#workaround "mention")

A table with positive `ttl` and a future `expire_time` means the admin\_token is **valid;** the renewer's 403 is a different problem (most likely the policy attached to the token has been narrowed). Open a Platform9 support ticket; Do NOT run the workaround.
{% endstep %}
{% endstepper %}

* The goal is to confirm the admin\_token has actually expired before doing anything destructive.

## Workaround

{% stepper %}
{% step %}
Perform the [#pre-requisites](#pre-requisites "mention")
{% endstep %}

{% step %}

### Mint a new admin\_token in Vault

Generate a fresh admin\_token in Vault with a long lifetime using `vault token create` from inside the Vault pod. The flags request a 768-hour (32-day) TTL and a renewable period.

{% code title="Control Plane Node" overflow="wrap" %}

```bash
$ kubectl exec -it -n default "<VAULT_POD>" -- vault token create \
  -policy kplane -ttl 768h -period 768h -format json
```

{% endcode %}

The output is a JSON block. Value of `auth.client_token` field starting with \[hvs...] is the new admin\_token to be referred to as `<NEW_ADMIN_TOKEN>` in the steps below

{% code title="Sample Output" overflow="wrap" %}

```json
{
  ...
  "auth": {
    "client_token": "hvs.CAESI...",
    "ttl": 2764800,
    ...
  }
}
```

{% endcode %}

{% endstep %}

{% step %}

### Write the new token to all required Consul KV paths

The new admin\_token has to be written to **multiple Consul paths**. All paths must be updated. NOTE: Partial writes will leave the environment in a worse state than before.

Exec into the Consul server pod and export the Consul ACL token

{% code title="Control Plane Node" overflow="wrap" %}

```
$ kubectl exec -it -n default decco-consul-consul-server-0 -- sh
```

{% endcode %}

Inside the Consul server pod

{% code title="Consul Pod" overflow="wrap" %}

```
$ export CONSUL_HTTP_TOKEN="<paste-the-CONSUL_TOKEN-value-from-airctl-state>"
```

{% endcode %}
{% endstep %}

{% step %}
Write the new token. The first path is **customer-level** (one entry per customer ID). The next two are **per-region** - repeat them for every affected region under this customer:

{% code title="Consul Pod" %}

```bash
# Customer-level (one entry, NOT per region)
consul kv put customers/<CUSTOMER_ID>/vault_servers/dev/admin_token "<NEW_ADMIN_TOKEN>"

# Per-region — repeat for EVERY affected region
consul kv put customers/<CUSTOMER_ID>/regions/<REGION_UUID>/services/vouch/vault_servers/dev/admin_token "<NEW_ADMIN_TOKEN>"
consul kv put customers/<CUSTOMER_ID>/regions/<REGION_UUID>/services/vouch/vault/server_key/admin_token "<NEW_ADMIN_TOKEN>"
```

{% endcode %}

Each successful `put` prints `Success! Data written to: <path>`.&#x20;

Type `exit` to leave the Consul pod.
{% endstep %}

{% step %}

### Update the deccaxon Kubernetes secret in every relevant namespace

The new token also has to be written into the `VaultToken` field of a Kubernetes secret called `deccaxon`. This secret exists in several namespaces and each copy must be updated.

The value stored in the secret must be **base64-encoded** (Kubernetes Secrets store values base64-encoded). Generate the encoded string:

{% code title="Control Plane Node" %}

```bash
$ echo -n "<NEW_ADMIN_TOKEN>" | base64
```

{% endcode %}

The output is a single line of base64 text. Copy it.

To list the namespaces where the `deccaxon` secret exists, it typically exists in `kplane` the namespace, the customer's `infra` namespace, and `every region` namespace.

{% code title="Control Plane Node" %}

```bash
$ kubectl get secret deccaxon -Aod
```

{% endcode %}

For each namespace listed above, edit the secret:

{% code title="Control Plane Node" %}

```bash
$ kubectl edit secret deccaxon -n <NS>
```

{% endcode %}

Find the line that reads `VaultToken: <SOME_BASE64_ENCODED_STRING>`. Replace its value with the base64-encoded string from earlier.&#x20;

Repeat for every namespace where the secret exists.
{% endstep %}

{% step %}

### Re-run the vouch-renew-token job in each affected region

Now that the admin\_token is valid, trigger the renewal job again. Run this for **every** affected region:

{% code title="Control Plane Node" %}

```bash
$ kubectl create job --from=cronjob/vouch-renew-token vouch-renew-token-manual-$(date +%s) -n <REGION_NS>
```

{% endcode %}

The `$(date +%s)` suffix ensures each manual job has a unique name (Kubernetes will not let you create two jobs with the same name). Watch the job until it reaches `Complete 1/1`. Press `Ctrl+C` once you see it complete:

{% code title="Control Plane Node" %}

```bash
$ kubectl get jobs -n <REGION_NS> -w | grep vouch-renew-token-manual
```

{% endcode %}

If the job stays in `Failed 0/1` or never reaches Complete, capture the renewer pod logs and contact Platform9 support before proceeding to the next step.

{% code title="Control Plane Node" %}

```bash
$ kubectl logs -n <REGION_NS> -l job-name=vouch-renew-token-manual-<your-timestamp> --tail=200
```

{% endcode %}
{% endstep %}

{% step %}

### Validate the renewed host\_signing\_token

Confirm the renewer wrote a valid `host_signing_token` by reading it from Consul and looking it up against Vault

{% code title="Control Plane Node" %}

```bash
$ HVS=$(kubectl exec -n default decco-consul-consul-server-0 -- \
  sh -c "CONSUL_HTTP_TOKEN='$CONSUL_TOKEN' consul kv get \
    customers/<CUSTOMER_ID>/regions/<REGION_UUID>/services/vouch/vault/host_signing_token")

$ kubectl exec -n default "<VAULT_POD>" -- env \
  VAULT_ADDR=http://localhost:8200 VAULT_TOKEN="$HVS" vault token lookup
```

{% endcode %}

Look for positive `ttl` and a future `expire_time`. Repeat this for every affected region. If any region's lookup still returns `permission denied`, stop and contact Platform9 support.
{% endstep %}

{% step %}

### Restart the vouch deployments to pick up the new tokens

The running vouch pods read their tokens at startup, so they need to be restarted to pick up the renewed values:

{% code title="Control Plane Node" %}

```bash
$ kubectl rollout restart deployment/vouch-keystone -n <REGION_NS>
$ kubectl rollout restart deployment/vouch-noauth   -n <REGION_NS>
```

{% endcode %}

Run this for every affected region.
{% endstep %}

{% step %}

### Verify recovery

Watch for the vouch pods come back to a healthy state.

{% code title="Control Plane Node" %}

```bash
$ kubectl get pods -A -w | grep vouch
```

{% endcode %}

Then confirm the overall region health:

{% code title="Control Plane Node" %}

```bash
$ airctl status
```

{% endcode %}

Ensure Full service counts (e.g. `30/30 ready`, `85/85 ready`) for every region and `region health: Ready`
{% endstep %}
{% endstepper %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://platform9.com/kb/pcd/self-hosted/vouch-renew-token-job-fails-with-vault-permission-denied-renewing-expired-vault-admin_token.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
