# VMHA is Stuck in ErrorRemoving state in the PCD GUI

## Problem

* After adding host to the PCD cluster, the VMHA status in the cluster section turned to be `Error Removing` .
* VMHA becomes non-functional after adding a node to the respective cluster.

## Environment

* Private Cloud Director Virtualization - v2025.4 and Higher
* Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher
* Component: VMHA

## Cause

* A stale compute-service entry was still listed in Nova's service records. The same stale host entry was being retrieved by the availability zone. Because of this, VMHA tried to use that host during setup, which caused an error and left the VMHA stuck in the `ErrorRemoving` state.

## Diagnostics

{% hint style="info" %}
**Info**

For SAAS customers contact Platform9 Support Team to validate if you are hitting the issue mentioned in this article.
{% endhint %}

* Review VMHA logs for any errors being logged during performing the disable-enabled action with VMHA over PCD UI. Capture the VMHA server logs from `hamgr` pod running inside affected region. The logs are present inside `/var/log/pf9/hamgr/`

```bash
$ kubectl exec -it deploy/hamgr -n <REGION_NAMESPACE> -- bash
```

* Capture the errors in hamgr logs.

{% code title="hamgr.log" overflow="wrap" %}

```bash
vmha.hamgr.providers.nova ERROR Disable HA request failed for cluster cluster-name
Traceback (most recent call last):
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://bbmaster.region-name.svc.cluster.local:8082/v1/hosts/[HOST_UUID]/apps
vmha.hamgr.db.api WARNING Task state being updated from removing to error-removing
vmha.hamgr.providers.nova INFO process enable request for Availability zone REGION_NAME
vmha.hamgr.providers.nova WARNING Cluster REGION_NAME is running task error-removing, cannot enable
vmha.__main__ INFO Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/eventlet/wsgi.py", line 614, in handle_one_response
    result = self.application(self.environ, start_response)
  File "/usr/local/lib/python3.9/site-packages/webob/dec.py", line 129, in __call__
```

{% endcode %}

* List compute services and availability zones and validate if there is any entry of the `HOST_UUID` captured in the error logs of hamgr. \
  Review its `State` and `Service Status` . In the sample output the HOST2.EXAMPLE.COM is the decommissioned node that was not properly removed.

<pre class="language-bash"><code class="lang-bash">$ openstack compute service list --service nova-compute
#sample output
+--------------------+-------------+--------------------+-------+---------+-------+--------------+
| ID                 | Binary      | Host               | Zone  | Status  | State | Updated At   |
+--------------------+-------------+--------------------+-------+---------+-------+--------------+
| [HOST1_SERVICE_ID] | nova-compute| [HOST1.EXAMPLE.COM]| [zone]| enabled | up    | [TIMESTAMP]  |
<strong>| [HOST2_SERVICE_ID] | nova-compute| [HOST2.EXAMPLE.COM]| [zone]| disabled| down  | [TIMESTAMP]  |
</strong>| [HOST3_SERVICE_ID] | nova-compute| [HOST3.EXAMPLE.COM]| [zone]| enabled | up    | [TIMESTAMP]  |
+--------------------+-------------+--------------------+-------+---------+-------+--------------+
</code></pre>

<pre class="language-bash"><code class="lang-bash">$ openstack availability zone list --compute --long
#sample output
+-----------+-------------+---------------+---------------------+----------------+---------|--------+
| Zone Name | Zone Status | Zone Resource | Host Name           | Service Name   | Service | Status |
+-----------+-------------+---------------+---------------------+----------------+---------|--------+
| [zone]    | available   |               | [HOST1.EXAMPLE.COM] | nova-compute   | enabled |  :-) 	|	 
<strong>| [zone]    | available   |               | [HOST2.EXAMPLE.COM] | nova-compute   | enabled |  XXX 	|	 
</strong>| [zone]    | available   |               | [HOST3.EXAMPLE.COM] | nova-compute   | enabled |  :-)		|
</code></pre>

## Resolution

* Identify the **stale** compute service entry from the output of the below command. In the sample output we see the service `HOST2.EXAMPLE.COM` is down.

  <pre class="language-bash"><code class="lang-bash">$ openstack compute service list

  #sample output
  +--------------------+-------------+--------------------+-------+---------+-------+--------------+
  | ID                 | Binary      | Host               | Zone  | Status  | State | Updated At   |
  +--------------------+-------------+--------------------+-------+---------+-------+--------------+
  | [HOST1_SERVICE_ID] | nova-compute| [HOST1.EXAMPLE.COM]| [zone]| enabled | up    | [TIMESTAMP]  |
  <strong>| [HOST2_SERVICE_ID] | nova-compute| [HOST2.EXAMPLE.COM]| [zone]| disabled| down  | [TIMESTAMP]  |
  </strong>| [HOST3_SERVICE_ID] | nova-compute| [HOST3.EXAMPLE.COM]| [zone]| enabled | up    | [TIMESTAMP]  |
  +--------------------+-------------+--------------------+-------+---------+-------+--------------+
  </code></pre>

2. Delete the **stale** service using below command:

   ```
   openstack compute service delete <HOST2_SERVICE_ID>
   ```
3. Wait for the VMHA to retry the operation automatically, or disable and re-enable VMHA to trigger a fresh reconcile attempt.

## Validation

* Ensure VMHA state transitions from `ErrorDeleting` to `Enabled`.
* Confirm no additional stale hosts remain.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://platform9.com/kb/pcd/generic/vmha-is-stuck-in-errorremoving-state-in-the-pcd-gui.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
