Troubleshooting Volume Creation issues
Problem
This guide provides step-by-step instructions for troubleshooting and resolving issues that arise when a volume creation fails.
Environment
Private Cloud Director Virtualization - v2025.4 and Higher
Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher
Component - PCD Storage Service
Various Volume Creation Methods
Blank Volume: The storage driver allocates a sparse file or raw block device on the backend. Minimal data movement is required; only metadata initialization occurs.
From Image: The driver creates a volume shell, then streams the image data from the Image service (Glance) to the storage host, converting it if necessary before writing to the backend.
From Snapshot: The driver creates a "Clone" or "Linked Clone" of a parent snapshot. This typically involves low data movement, as it is handled via metadata pointers by the storage array (like Ceph or Tintri).
From Volume: The driver creates a bit-for-bit clone of an existing source volume. Data movement depends on the backend's cloning capabilities.
Deep Dive
The Private Cloud Director volume creation process is orchestrated primarily by the Block Storage service (Cinder). This flow involves a complex interaction between the API, the Scheduler, and the storage node drivers to ensure space is allocated, formatted, and registered correctly on the target backend.
NOTE The kubectl logs below can only be reviewed in Self-Hosted Private Cloud Director.
Step 1: User Request & API Validation
This is the initial stage where the volume creation request is received and validated. User Request: A user submits a request to create a volume. The cinder-api pod validates the authentication token with Keystone. State Check: The API checks the project's volume and gigabyte quotas to ensure the request is authorized. If successful, it generates a unique Request ID (REQ_ID) and updates the database to set the volume status to creating.
Step 2: Scheduling & Destination Selection
The request is handed off to the Nova/Cinder Scheduler to find the appropriate backend. Host Filtering: The scheduler evaluates available storage pools, filtering hosts based on requested Availability Zones, Volume Types, and sufficient free_capacity_gb.
Step 3: Driver Provisioning & Backend Setup
The request is routed to the specific storage host selected by the scheduler. Destination Setup: The pf9-cindervolume-base service on the target host translates the request into driver-specific commands (e.g., NFS, Ceph RBD, or LVM) to physically allocate the space on the storage array.
Step 4: Data Conversion (If Applicable) & Cutover
If the volume is being created from an image, the storage host must fetch and write the data. Data Transfer & Conversion: The storage host downloads the image into a local staging directory (e.g., /var/lib/cinder/conversion), runs qemu-img convert to match the backend's format, and writes it to the volume. Cleanup: The temporary files are deleted, and the volume status is updated to available.
Procedure
Get the Volume Status Use the CLI to check the error status and details of the volume. Look for specific fault messages if the volume is in an
errorstate.
Validate Block Storage Service Status Ensure the volume services across the cluster (API, Scheduler, and Volume workers) are up and enabled.
Trace the Volume Messages Cinder stores user-facing error messages in the database. Retrieve these to quickly identify scheduling or backend failures without digging into raw logs.
Review the Pods and its logs on the Management plane This step is applicable only for the Self-Hosted Private Cloud Director. Check the management plane pods to see if the Scheduler failed to find a host or if the API rejected the request. Review pod logs using the
REQ_IDorVOLUME_UUID.
Validate Target Backend Capacity Check if the storage pools actually have the required space. Pay attention to
free_capacity_gband themax_over_subscription_ratio.
Validate Host Capabilities & Volume Types Ensure the target host supports the requested Volume Type capabilities (e.g., thin provisioning, encryption).
Validate the service status on the Target Storage Node Validate that the Cinder volume daemon is running on the compute/storage node designated to handle the backend.
Check the logs on the Storage Node Review the Cinder logs on the target storage node. Search for the
REQ_IDorVOLUME_UUIDto identify driver-level rejections or conversion timeouts.
Most common causes
No Valid Host Found: The Cinder scheduler rejected all available storage hosts because they lacked sufficient
free_capacity_gb, did not match the requested Volume Type, or belonged to the wrong Availability Zone.Oversubscription Limits: The
max_over_subscription_ratiohas been reached. The scheduler will refuse to provision new volumes, even if physical disk space is still available on the array, because the allocated virtual space exceeds the allowed ratio.Staging Area Exhaustion: When creating a volume from a large image, the local filesystem on the storage host (often
/var/lib/cinder/conversion) runs out of disk space during theqemu-img convertprocess.Storage Backend Disconnect: The
pf9-cindervolume-baseservice cannot communicate with the physical storage array (e.g., an NFS mount point dropped, or Ceph monitors are unreachable), causing the driver creation command to fail.Quota Exceeded: The project has reached its administrative limit for total volumes or total gigabytes. This usually fails immediately at the API level (Step 1).
Last updated
