VM stuck in "spawning" or "rebooting" task state in Nova database
Error in /var/log/pf9/ostackhost.log:
Cannot reboot instance:Command: multipath -f [DEVICE_ID]Exitcode:1Stderr:'<DEVICE_ID>: map in use'
Environment
Private Cloud Director Virtualization - v2025.4 and Higher
Private Cloud Director Kubernetes – v2025.4 and Higher
Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher
Self-Hosted Private Cloud Director Kubernetes - v2025.4 and Higher
Component - iSCSI storage backend
Cause
The multipath device lock occurs when the device-mapper has active references to a storage volume that cannot be released during VM reboot/shutdown operations. This is typically caused by:
Accumulated State Corruption: Weeks of failed operations leave:
Stale device-mapper references
Unclosed iSCSI sessions
Incomplete volume detach operations
Orphaned multipath devices
When a reboot is initiated, Nova attempts to disconnect volumes by flushing the multipath device. However, accumulated stale references from previous failed operations prevent the flush, causing the map in use error even though no active processes are using the device.
Diagnostics
Step 1: Verify the Multipath Device Lock
Expected output:
multipath -f returns exit code 1 with error: [DEVICE_ID]: map in use
lsof and fuser may show no processes, indicating stale kernel references
Step 2: Check VM State in Nova
Expected output:
task_state may show: rebooting, powering-on, or similar
vm_state may show: error, stopped, or mismatch with actual hypervisor state
Step 3: Check iSCSI Sessions
Step 4: Review Compute Logs
Resolution
Method 1: Quick Recovery - VM Rebuild
Use when: Service restoration is priority, time is limited, production environment
Time: 15-20 minutes
Steps:
Document current VM configuration:
Verify volume preservation flag:
Create volume snapshot through the UI or using following commands:
# On the compute node, check multipath status
$ sudo multipath -ll
# Attempt to flush the specific device (will fail with "map in use")
$ sudo multipath -f <DEVICE_ID>
# Check device-mapper status
$ sudo dmsetup ls
$ sudo dmsetup info <DEVICE_ID>
# Check for processes using the device (may show none)
$ sudo lsof /dev/mapper/<DEVICE_ID>
$ sudo fuser -vm /dev/mapper/<DEVICE_ID>
# From control plane or with OpenStack credentials
$ openstack server show <VM_ID>
# Check task state (look for stuck states)
$ openstack server show <VM_ID> -f value -c OS-EXT-STS:task_state
# Check VM state
$ openstack server show <VM_ID> -f value -c OS-EXT-STS:vm_state
# On compute node
$ sudo iscsiadm -m session
# Check for stale sessions related to the volume
$ sudo iscsiadm -m session -P 3 | grep -A 20 <DEVICE_ID>
# Check for the specific error
$ sudo grep -A 5 "Cannot reboot instance" /var/log/pf9/ostackhost.log
# Check for device cleanup warnings
$ sudo grep "leftovers may remain" /var/log/pf9/ostackhost.log
$ openstack server show <VM_ID> -f json > vm-config-backup.json
$ openstack server volume list <VM_ID> > vm-volumes-backup.txt
$ openstack port list --server <VM_ID> > vm-ports-backup.txt
# Check if delete_on_termination is false for volumes
$ openstack server show <VM_ID>
# If needed, update the flag
$ openstack server volume set --preserve-on-termination <VM_ID> <VOLUME_ID>
# For each attached volume
$ openstack volume snapshot create --volume <VOLUME_ID>\
--name "snapshot-<vm-name>-$(date +%Y%m%d-%H%M%S)"
# Wait for snapshot to complete
$ openstack volume snapshot list --volume <VOLUME_ID>
$ openstack server delete <VM_ID>
# Wait for deletion to complete (30-60 seconds)
$ openstack server list | grep <VM_ID>
# Verify volumes still exist
$ openstack volume list | grep <VOLUME_ID>
# SSH to the compute node
$ sudo multipath -r <DEVICE_ID>
# Verify device is removed
$ sudo multipath -ll | grep <DEVICE_ID>
# Use saved configuration from step 1
$ openstack server create \
--flavor <FLAVOR_ID> \
--volume <VOLUME_ID> \
--nic port-id=<PORT_ID> \
--availability-zone <AZ> \
<VM_NAME>
# Monitor creation
$ openstack server show <NEW_VM_ID>
$ openstack server show <NEW_VM_ID>
# Test connectivity
$ ping <vm-ip>
$ ssh <vm-ip>
$ openstack server stop <VM_ID>
$ sudo multipath -ll
$ sudo ls -la /dev/disk/by-id/ | grep <VOLUME_WWN>
$ sudo dmsetup ls
# Rescan SCSI hosts
for host in /sys/class/scsi_host/host*; do
echo "- - -" > $host/scan
done
# Rescan iSCSI sessions
sudo iscsiadm -m session --rescan
$ sudo systemctl restart pf9-ostackhost
# Verify service is running
$ sudo systemctl status pf9-ostackhost
$ openstack server start <VM_ID>
# Monitor startup
$ openstack server show <VM_ID>
# On compute node
$ sudo multipath -ll | grep <DEVICE_ID>
# Should return no results
$ sudo dmsetup ls | grep <DEVICE_ID>
# Should return no results
$ sudo lsof | grep <DEVICE_ID>
# Should return no results
$ openstack server show <VM_ID>
# Should show: status=ACTIVE, power_state=Running, task_state=None
$ openstack server list --all | grep <VM_NAME>
# Should show: Status=ACTIVE
# Test connectivity
$ ping <VM_IP>
$ ssh <VM_IP>
$ openstack compute service list
# Should show: State=up, Status=enabled
# On compute node
$ sudo tail -100 /var/log/pf9/ostackhost.log | grep -i error
# Should show no recent multipath or device-mapper errors
# Check for VMs in ERROR state
$ openstack server list --status ERROR
# Check multipath health on compute nodes
$ sudo multipath -ll | grep -i fail