Troubleshooting Volume Attachment & Detachment Issues

Problem

This guide provides step-by-step instructions for troubleshooting and resolving issues that arise when a volume attachment or detachment fails. Symptoms typically include volumes stuck in attaching or detaching states, or failing with specific hypervisor faults.

Environment

  • Private Cloud Director Virtualization - v2025.4 and Higher

  • Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher

  • Component - PCD Storage Service

Volume Attachment Protocols

The method by which a volume is attached depends entirely on your backend storage configuration. To check the list of the supported backend volume drivers Refer the following linkarrow-up-right.

Deep Dive

The Private Cloud Director volume attach/detach process is a highly coordinated handshake between the Compute service (Nova) and the Block Storage service (Cinder). Nova acts as the primary orchestrator, asking Cinder for connection details, physically mapping the storage to the compute node using a library called os-brick, and finally hot-plugging it into the running VM via Libvirt.

circle-info

Note: The kubectl logs shown in the API steps below can only be reviewed in Self-Hosted Private Cloud Director environments.

Step 1: User Request & API Validation

A user submits a request to attach a volume to a specific VM. The nova-api validates the request, checks if the VM is in a valid state (ACTIVE or SHUTOFF), and checks if the volume is available. Cinder then changes the volume status to attaching to lock it.

Sample Log (nova-api-osapi)
$ kubectl logs deployment/nova-api-osapi -n <WORKLOAD_REGION> | grep "POST /v2.1" 
INFO nova.osapi_compute.wsgi.server [None [REQ_ID] [USER_ID] [TENANT_ID] - - default default] [IP] "POST /v2.1/[tenant_id]/servers/[VM_UUID]/os-volume_attachments HTTP/1.1" status: 202 len: [.] time: [.]

Step 2: Connection Initialization (Cinder)

Nova Compute asks Cinder to "initialize the connection." Cinder communicates with the storage backend to allow the compute node to access the volume (and returns these connection details to Nova.

Step 3: Host-Level Storage Mapping

The pf9-ostackhost service on the target compute node receives the connection details. Using the os-brick library, the compute node discovers the physical volume (e.g., logging into the iSCSI target and scanning for new SCSI devices).

Step 4: Hypervisor Hot-Plug & Status Update

Nova generates the XML for the disk and calls Libvirt to dynamically hot-plug the block device to the running QEMU process. Once confirmed, Nova tells Cinder to finalize the attachment, moving the volume status to in-use.

Procedure: Diagnostic Commands

1. Get the Volume and Attachment Status

Check if the volume is stuck in a transitional state or if it threw a specific fault.

2. Validate Compute and Volume Service Health

Since attach/detach operations depend on the compute node physically mapping the storage, ensure that the compute service hosting the VM is healthy and that the block storage service running on the storage node is also healthy.

3. Review API Rejections (Management Plane)

Check if Nova or Cinder API rejected the request before it reached the compute node.

4. Check the Hypervisor (Libvirt) State

Verify if the hypervisor actually sees the disk attached to the VM at the system level.

5. Validate OS-Level Block Devices

If using iSCSI or NFS, verify that the compute node's operating system successfully discovered the disk.

6. Check the Compute Node Logs

The vast majority of attach/detach failures happen during the os-brick mapping phase on the compute host.

7. Resetting a Stuck Volume State (Admin Only)

If a process completely times out and the volume is permanently stuck in attaching or detaching despite no active tasks running, you may need to manually reset the state.

Most Common Causes

  • Guest OS Holding the Disk (Detach Failure): The most common detach issue. The virtual machine's operating system is actively reading/writing to the disk (or it is mounted in /etc/fstab). Libvirt will refuse to hot-unplug the disk to prevent data corruption.

  • Stale iSCSI Sessions / Multipath Issues: The compute node has leftover, broken iSCSI sessions from previous operations, causing os-brick to hang or map the wrong device path during the attachment phase.

  • Storage Network Connectivity: The compute node lacks the network routing required to reach the storage data network (e.g., the iSCSI target IPs or the Ceph public network).

  • Libvirt/QEMU Timeout: The storage backend responds too slowly, causing the hypervisor's hot-plug operation to time out, leaving the volume locked in an attaching state.

  • Missing Packages: The compute node is missing required backend tools (like sysfsutils, multipath-tools, open-iscsi, or ceph-common), preventing it from successfully mapping the volume to the host.

Last updated