Virtual Machine Deletion Issues

Problem

This guide provides step-by-step instructions for troubleshooting and resolving issues when deleting a virtual machine (VM) fails or hangs in Private Cloud Director.

Environment

  • Private Cloud Director Virtualization - v2025.4 and Higher

  • Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher

Key Concept: Ephemeral vs. Volume-Booted Instances

Before troubleshooting a deletion failure, it is critical to identify how the VM's storage was deployed, as this dictates the deletion workflow:

  • Ephemeral Instances: Booted from an image directly onto the compute node's local disk. The OS disk and local data are automatically destroyed by the hypervisor during a standard deletion.

  • Volume-Booted Instances: Booted from a persistent Cinder volume. The volume's fate depends on the "Delete on Terminate" flag set during creation. If checked, PCD destroys the volume; if unchecked, the volume is simply detached and marked as available.

Various VM Deletion Methods & Examples

Understanding how a VM was requested to be deleted is crucial for tracing where the failure occurred.

1. Standard Delete

Gracefully shuts down and removes the instance, releasing its ephemeral storage and network ports.

2. Force Delete (For Stuck Instances)

Bypasses standard graceful shutdown procedures to forcefully terminate an instance that is stuck in a locked task state (e.g., deleting or migrating).

3. Handling Persistent Volumes During Deletion

If the VM has attached Cinder volumes, you must verify the requested outcome for the data (Retain vs. Destroy).

Execution (If data MUST be destroyed):

Deep Dive

The Private Cloud Director VM deletion process requires tight coordination between Compute, Networking and Storage services. A failure in any of these handoffs can result in a VM getting stuck in a deleting task state or leaving behind "orphaned" resources.

circle-info

The logs can only be reviewed in Self-Hosted Private Cloud Director.

Step 1: User Request & API Validation

The deletion request is authenticated by Keystone and received by the Nova API.

  • State Check: Nova API verifies the VM exists and updates the database to mark the task_state as deleting.

circle-info

Here a unique REQ_ID will be generated, which will be further used for tracking the request in other component log

Step 2: Compute Node & Hypervisor Cleanup

The pf9-ostackhostservice on the host running the VM receives the RPC message to destroy the instance.

  • Libvirt Teardown: nova-compute instructs Libvirt to power off the VM (if running) and undefine the XML domain.

  • Ephemeral Storage Purge: The instance directory(/opt/pf9/data/instances/[VM_UUID]) containing the disk is permanently deleted.

Step 3: Storage (Cinder) Detachment

If the VM has attached Cinder volumes, Nova requests Cinder to detach them.

  • If the volume was configured to "Delete on Terminate", Cinder proceeds to delete the volume from the storage backend.

  • If not (which is standard for data retention), the volume status simply changes back to available for future use.

circle-exclamation

Step 4: Network (Neutron) Cleanup

Nova signals Neutron to unbind and delete the virtual network interface (VIF) and associated ports. Neutron removes the OVS/OVN rules and releases the IP address back to the IPAM pool.

Step 5: Final Database Purge

Once all resources report successful deletion, nova-conductor marks the instance as deleted in the database, effectively removing it from the user's view and freeing up project quota.

Procedure

1. Get the VM Status

Check the current state of the VM. If it is stuck, it will typically show a status of ERROR or a task state of deleting. Look specifically at the status, OS-EXT-STS:task_state, and fault fields.

2. Attempt a Force Delete (If stuck)

If the standard delete has hung for more than 10-15 minutes, attempt to force the deletion via the API by resetting its state.

3. Trace the VM Events

Retrieve the Request ID (REQ_ID) from the server event list to find exactly which component (Compute, Network, or Storage) is holding up the deletion.

4. Verify Compute Service Health

A VM cannot be cleanly deleted if the compute node hosting it is offline or the pf9-ostackhost service is dead. Check if the host is up.

5. Check for Stuck Cinder Volumes & Verify Retention

Identify if the VM has attached volumes that are failing to detach, and confirm whether those volumes are meant to be kept or purged.

6. Check for Orphaned Neutron Ports

Sometimes the network port gets locked by another service (like a stale floating IP or router interface), preventing Nova from deleting it.

7. Review the Logs on the Compute Node

If the VM is still stuck, review the compute node's logs to see if Libvirt is failing to destroy the domain (e.g., due to a hung QEMU process). Search for the REQ_ID or VM_UUID.

8. Last Resort: Manual Database Update (Hard DB Delete)

If all API methods fail and the underlying hypervisor, storage, and network resources have been manually verified as destroyed or safely detached, you can forcefully remove the VM from the database to clear it from the UI.

triangle-exclamation

Most common causes

  • Dead Compute Node: The host machine is offline, down, or isolated from the management plane, meaning it never receives the RPC message to delete the VM.

  • Stuck Volume Detachment: Cinder cannot terminate the iSCSI session or Ceph RBD lock, causing the volume to freeze in a detaching state, which holds the Nova deletion task hostage.

  • Hung QEMU/KVM Process: The hypervisor process for the VM has become unresponsive to standard ACPI shutdown signals or libvirt destroy commands.

  • Locked Network Ports: A Neutron port fails to delete because it is still incorrectly bound to a floating IP or a stale security group rule.

  • Database Desynchronization: A temporary network blip caused pf9-ostackhost to successfully delete the VM locally, but it failed to update nova-conductor, leaving a "ghost" VM in the dashboard.

Last updated