Unable to Start VM due to Multipath issue

Problem

  • VM fails to reboot with error:

multipath -f [DEVICE_ID]: map in use
  • VM stuck in "spawning" or "rebooting" task state in Nova database

  • Error in /var/log/pf9/ostackhost.log:

Cannot reboot instance: Command: multipath -f [DEVICE_ID]Exit code: 1 Stderr: '<DEVICE_ID>: map in use'

Environment

  • Private Cloud Director Virtualization - v2025.4 and Higher

  • Private Cloud Director Kubernetes – v2025.4 and Higher

  • Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher

  • Self-Hosted Private Cloud Director Kubernetes - v2025.4 and Higher

  • Component - iSCSI storage backend

Cause

The multipath device lock occurs when the device-mapper has active references to a storage volume that cannot be released during VM reboot/shutdown operations. This is typically caused by:

Accumulated State Corruption: Weeks of failed operations leave:

  • Stale device-mapper references

  • Unclosed iSCSI sessions

  • Incomplete volume detach operations

  • Orphaned multipath devices

When a reboot is initiated, Nova attempts to disconnect volumes by flushing the multipath device. However, accumulated stale references from previous failed operations prevent the flush, causing the map in use error even though no active processes are using the device.

Diagnostics

Step 1: Verify the Multipath Device Lock

Expected output:

  • multipath -f returns exit code 1 with error: [DEVICE_ID]: map in use

  • lsof and fuser may show no processes, indicating stale kernel references

Step 2: Check VM State in Nova

Expected output:

  • task_state may show: rebooting, powering-on, or similar

  • vm_state may show: error, stopped, or mismatch with actual hypervisor state

Step 3: Check iSCSI Sessions

Step 4: Review Compute Logs

Resolution

Method 1: Quick Recovery - VM Rebuild

Use when: Service restoration is priority, time is limited, production environment

Time: 15-20 minutes

Steps:

  1. Document current VM configuration:

  2. Verify volume preservation flag:

  3. Create volume snapshot through the UI or using following commands:

  4. Delete the VM (preserving volumes):

  5. Clean up multipath device on compute node:

  6. Recreate VM with same configuration:

  7. Verify VM is running:

Method 2: Manual Recovery - Multipath Cleanup (For Investigation/Non-Production)

Use when: time available for investigation, want to preserve VM ID

Time: 45-60 minutes

Steps:

  1. Stop the VM:

  2. On compute node, identify the stuck device:

  3. Check what's using the device:

  4. Force stop VM in libvirt (if running):

  5. Remove device-mapper device:

  6. Flush multipath device:

  7. Rescan SCSI/iSCSI:

  8. Restart compute service:

  9. Start the VM:

Validation

  • Before starting the VM verify Multipath Cleanup:

  • Once VM is started verify VM Status:

  • Verify Compute Service:

  • Check for Errors:

Additional Information

Regular Health Checks:

Last updated