VMHA Stuck in "Waiting"
Problem
VMHA remains stuck in the "waiting" state during enablement.
Environment
- Private Cloud Director Virtualization – v2025.4 and Higher
- Self-Hosted Private Cloud Director Virtualization – v2025.4 and Higher
Cause
A decommissioned host was still listed in Nova's service records. Because of this, VMHA tried to use that host during setup, which caused an error and left the VMHA stuck in the "waiting" state.
Diagnostics
For SAAS customers contact Platform9 Support Team to validate if you are hitting the issue mentioned in this article.
- Check VMHA logs:
$ kubectl exec deploy/hamgr -n <REGION_NAMESPACE> -- cat /var/log/pf9/hamgr/hamgr.log | grep -A1 'Enabling HA'
Look for log entries like:
Enabling HA on some of the hosts [...] including host '[HOST-ID]'
WARNING Role status of host [HOST-ID] is not ok
- List compute services and validate if any of the hypervisors are showing the "Status" as
disabled
and "State"down
Identify services that are down, disabled, or associated with non-existent or decommissioned hosts. In the sample output the HOST2.EXAMPLE.COM is the decommissioned node.
$ openstack compute service list
#sample output
+--------------------+-------------+--------------------+-------+---------+-------+--------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+--------------------+-------------+--------------------+-------+---------+-------+--------------+
| [HOST1_SERVICE_ID] | nova-compute| [HOST1.EXAMPLE.COM]| [zone]| enabled | up | [TIMESTAMP] |
| [HOST2_SERVICE_ID] | nova-compute| [HOST2.EXAMPLE.COM]| [zone]| disabled| down | [TIMESTAMP] |
| [HOST3_SERVICE_ID] | nova-compute| [HOST3.EXAMPLE.COM]| [zone]| enabled | up | [TIMESTAMP] |
+--------------------+-------------+--------------------+-------+---------+-------+--------------+
- List hypervisors and validate host mapping. In the sample output, we see that the node
[HOST2.EXAMPLE.COM]
is in adown
state. we can check its associatedservice ID
to validate the host mapping
$ openstack hypervisor list
#sample output:
+----------------+---------------------+-----------------+-------------+-------+
| ID | Hypervisor Hostname | Hypervisor Type | Host IP | State |
+----------------+---------------------+-----------------+-------------+-------+
| [HOST1_UUID] | [HOST1.EXAMPLE.COM] | QEMU | [IP-ADDR-1] | up |
| [HOST2_UUID] | [HOST2.EXAMPLE.COM] | QEMU | [IP-ADDR-2] | down |
+----------------+---------------------+-----------------+-------------+-------+
$ openstack hypervisor show <HYPERVISOR_ID>
#sample output
$ openstack hypervisor show [HOST2_UUID]
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
| aggregates | [] |
| cpu_info | None |
| host_ip | [IP-ADDR-2] |
| hypervisor_hostname | [HOST2.EXAMPLE.COM] |
| hypervisor_type | QEMU |
| hypervisor_version | [HYPERVISOR_VERSION] |
| id | [HOST2_UUID] |
| service_host | [SERVICE_HOST_UUID] |
| service_id | [HOST2_SERVICE_ID] |
| state | down |
| status | disabled |
+---------------------+--------------------------------------+
Resolution
- Identify the stale compute service entry from the output of the below command, in the sample output we see the node
HOST2.EXAMPLE.COM
is down.
$ openstack compute service list
#sample output
+--------------------+-------------+--------------------+-------+---------+-------+--------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+--------------------+-------------+--------------------+-------+---------+-------+--------------+
| [HOST1_SERVICE_ID] | nova-compute| [HOST1.EXAMPLE.COM]| [zone]| enabled | up | [TIMESTAMP] |
| [HOST2_SERVICE_ID] | nova-compute| [HOST2.EXAMPLE.COM]| [zone]| disabled| down | [TIMESTAMP] |
| [HOST3_SERVICE_ID] | nova-compute| [HOST3.EXAMPLE.COM]| [zone]| enabled | up | [TIMESTAMP] |
+--------------------+-------------+--------------------+-------+---------+-------+--------------+
- Delete the stale service using below command, post deletion of the stale entry we will still have minimum two working hypervisors as per the requirement of enabling VMHA
$ openstack compute service delete <HOST2_SERVICE_ID>
- Wait for the VMHA to retry the operation automatically, or disable and re-enable VMHA to trigger a fresh attempt.
Validation:
- Ensure VMHA state transitions from
waiting
toenabled
. - Confirm no additional stale hosts remain.
Additional Information:
- At minimum two working hypervisors are needed for enabling VMHA