# Storage Service Troubleshooting Guide

## Problem

This troubleshooting guide aims to empower users by providing clear, actionable steps, common error explanations, and best practices to quickly and independently solve Storage-related problems. Specifically, Cinder in <code class="expression">space.vars.product\_name</code>.

## Environment

* Private Cloud Director Virtualization - v2025.4 and Higher
* Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher
* Component - PCD Storage Service

## Deep Dive

### Volume Creation Flow

This is the process of provisioning a new block storage device from a storage backend.

1. **API Request:** A user sends a request to create a volume via the OpenStack CLI, <code class="expression">space.vars.product\_name</code> dashboard, or direct API call. The **`cinder-api`** service receives this request, authenticates the user with **Keystone**, and raises a **`POST`**`https://<FQDN>/v3/<TENANT_UUID>/volumes` volume request, which further creates a Cinder database entry for the volume with a status of `creating`. The below `cinder-api` pod logs sample shows the POST request, Volume size and successful issue for volume creation request.

   <pre class="language-bash" data-title="Sample logs:"><code class="lang-bash">INFO cinder.api.openstack.wsgi [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] POST https:/&#x3C;FQDN>/v3/&#x3C;TENANT_UUID>/volumes
   INFO cinder.api.v3.volumes [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] Create volume of 2 GB
   INFO cinder.volume.api [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] Availability Zones retrieved successfully.
   INFO cinder.volume.api [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] Create volume request issued successfully.
   </code></pre>
2. **Cinder-Scheduler:** The request is passed to the **`cinder-scheduler`**. This component makes a decision on where to store the volumes using filters like Capacity Filters, Availability zone filters and many other filters. More filters can be found [here](https://docs.openstack.org/cinder/train/configuration/block-storage/scheduler-filters.html) to decide which storage backend (e.g., Ceph, LVM) is the best place to create the volume based on size, type, and availability.
3. **Volume Service Action:** The scheduler sends the request to the `pf9-cindervolume-base` service responsible for the chosen backend. This service is the worker that uses a specific storage driver to command the backend.
4. **Backend Provisioning:** The storage backend (the actual storage system) receives the commands and provisions the physical or logical block device. Here, on the underlying Persistent Storage hosts, the `/var/log/pf9/cindervolume-base.log` will show the requested raw volume specifications, which include Volume name, Volume UUID and Volume size.

   <pre data-title="Sample logs:"><code>INFO cinder.volume.flows.manager.create_volume [[REQ-ID] None service] Volume [VOLUME_UUID]: being created as raw with specification: {'status': 'creating', 'volume_name': 'volume-[VOLUME_UUID]', 'volume_size': 2}
   </code></pre>
5. **Status Update:** Once the backend confirms the volume is created, the `pf9-cindervolume-base` service sends the update request to the cinder database, changing the volume's status to `available`. Here, on the underlying Persistent Storage hosts, the `/var/log/pf9/cindervolume-base.log` will show the final status that volume is created.

   <pre data-title="Sample logs:"><code>INFO cinder.volume.flows.manager.create_volume [[REQ-ID] None service] Volume volume-[VOLUME_UUID] ([VOLUME_UUID]): created successfully
   INFO cinder.volume.manager [[REQ-ID] None service] Created volume successfully.
   </code></pre>

### Attaching a Volume to VM Flow

This process is a collaboration, primarily between **Compute** and **Block Storage**.

1. **User Request (via Nova):** A user requests to attach an existing, `available` volume to a specific VM. This request goes to the `nova-api-osapi` service, not the Cinder API.

   <pre data-title="Sample logs:"><code>INFO nova.osapi_compute.wsgi.server [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] [IP],[IP] "POST /v2.1/[PROJECT_UUID]/servers/[VM_UUID]/os-volume_attachments HTTP/1.1" status: 200 len: 569 time: 0.8244848
   </code></pre>
2. **Nova to Cinder Communication:** The `pf9-ostackhost` service on the host where the VM is running calls the `cinder-api` to get the connection information for the volume. Once volume information is received it further attach the volume as shown in `/var/log/pf9/ostackhost.log` logs.

   <pre data-title="Sample logs:"><code>INFO nova.compute.manager [[REQ-ID] [USER_NAME] [TENANT_NAME]] [instance: [VM_UUID]] Attaching volume [VOLUME_UUID] to /dev/vdx
   </code></pre>
3. **Cinder Prepares the Attachment:** The `cinder-api` passes the request to the `pf9-cindervolume-base` service. Cinder performs necessary actions to "reserve" the volume and prepares it for attachment. It then generates the required connection details (e.g., the iSCSI target, Ceph RBD path). Once that is successful the `/var/log/pf9/cindervolume-base.log` logs will shows the attachment successful message. Volume status will be "`reserved`".

   <pre data-title="Sample logs:"><code>INFO cinder.volume.manager [[REQ-ID] None service] attachment_update completed successfully.
   INFO cinder.volume.manager [[REQ-ID] None service] Volume connection completed successfully.
   </code></pre>
4. **Cinder Responds to Nova:** Cinder sends these connection details back to `nova-compute` to the `pf9-ostackhost` service on the host.
5. **Nova Makes the Connection:** Once `pf9-ostackhost` receives the connection info, `pf9-ostackhost` service uses the host's operating system and hypervisor (e.g., QEMU/KVM) to connect the VM to the storage volume. Volume status will be "`attaching`".
6. **Final Status Update:** Once the connection is successful, `pf9-ostackhost` service informs Cinder, and Cinder updates the volume's status in its database to `in-use` and records which VM it's attached to.

### Volume Deletion Flow

This process is a collaboration, primarily on **Block Storage**.

1. **User Request (via Nova):** A user requests to delete an existing volume via the CLI, <code class="expression">space.vars.product\_name</code> dashboard, or direct API call. which validates the user's authentication token with **Keystone**, and performs a permission check, and changes the volume status in the database to `deleting`. This request `DELETE /v3/{project_id}/volumes/{volume_id}` goes to `cinder-api` service.
2. **Further Validation:** Cinder-service checks Volume state. if it is in *available*, *error*, *error\_restoring*, *error\_extending* then the Normal delete operation is performed. If the volume state is *in-use (attached)*, then Normal delete will be rejected unless force delete option is used.

   <pre data-title="Sample logs:"><code>INFO cinder.api.openstack.wsgi [None [REQ-ID] [USER_NAME] [TENANT_NAME] - - default default] DELETE https:/[FQDN]/v3/[PROJECT_UUID]/volumes/[VOLUME_UUID]
   INFO cinder.api.v3.volumes [None [REQ-ID] [USER_NAME] [TENANT_NAME] - - default default] Delete volume with id: [VOLUME_UUID]
   INFO cinder.volume.api [None [REQ-ID] [USER_NAME] [TENANT_NAME] - - default default] Volume info retrieved successfully.
   INFO cinder.volume.api [None [REQ-ID] [USER_NAME] [TENANT_NAME] - - default default] Delete volume request issued successfully.
   INFO cinder.api.openstack.wsgi [None [REQ-ID] [USER_NAME] [TENANT_NAME] - - default default] https:/[FQDN]/v3/[PROJECT_UUID]/volumes/[VOLUME_UUID] returned with HTTP 202
   </code></pre>
3. **Cinder Prepares for delete:** The RPC request is routed to the the `pf9-cindervolume-base` service hosting the volume (no scheduler step needed for delete). Backend driver/manager attempts to terminate connections and detach (best-effort). If connector cleanup fails, delete may fail with error\_deleting. Driver delete\_volume() removes the LUN/target/extent from the storage backend. Further the `/var/log/pf9/cindervolume-base.log` show the volume device mapper is being deleted.

   <pre data-title="Sample logs:"><code>INFO cinder.volume.volume_utils [[REQ-ID] None service] Performing secure delete on volume: /dev/mapper/cinder--volumes-volume--[VOLUME_UUID]
   </code></pre>
4. **Cinder Volume Deletion Confirmation:** On successful backend delete, quotas for volumes and gigabytes are decremented and further the `/var/log/pf9/cindervolume-base.log` show the volume is successfully deleted.

   <pre data-title="Sample logs:"><code>INFO cinder.volume.drivers.lvm [[REQ-ID] None service] Successfully deleted volume: [VOLUME_UUID]
   </code></pre>
5. **Final Status Update:** Persistent Storage service `pf9-cindervolume-base` sends the database update request to the Cinder DB.

### Detaching a Volume to VM Flow

This process is a collaboration, primarily between **Compute** and **Block Storage**.

1. **User Request (via Nova):** A user requests to detach a volume from a VM. This request goes to the **nova-api-osapi** pod, not directly to Cinder. Extract out the volume ID from the pod logs. Note the `request-id` from the logs:

{% code overflow="wrap" %}

```
INFO nova.osapi_compute.wsgi.server [None [REQ_ID] [USER_ID] [TENANT_ID] - - default default] [IP],[IP] "DELETE /v2.1/[PROJECT_UUID]/servers/[VM_UUID]/os-volume_attachments/[VOLUME_UUID] HTTP/1.1" status: 202 len: 379 time: 0.2343676
```

{% endcode %}

2. **Nova Initiates Detach Operation:** The `pf9-ostackhost` service on the compute node (where the VM is running) initiates the detach process. It first interacts with the hypervisor to safely remove the disk from the VM. Extract the captured `request-id` from the logs:

{% code overflow="wrap" %}

```
$ grep [REQ_ID] ostackhost.log
INFO nova.compute.manager [[REQ_ID] [USER_NAME] [TENANT_NAME] Pod] [instance: [VM_UUID] ] Detaching volume [VOLUME_UUID]
```

{% endcode %}

3. **Nova to Cinder Communication:** After initiating the detach, `pf9-ostackhost` calls the `cinder-api` to terminate the volume connection. Extract the `request-id` from `cinder-api` pod logs:

{% code overflow="wrap" %}

```
INFO cinder.volume.api [ [REQ_ID] [INTERNAL_REQ_ID] [TENANT_ID] [USER_ID] - - default default] Begin detaching volume completed successfully.
INFO cinder.api.openstack.wsgi [[REQ_ID] [INTERNAL_REQ_ID] [TENANT_ID] [USER_ID] - - default default] DELETE http://<CINDER_API_ENDPOINT>/v3/<TENANT_ID>/attachments/<ATTACHMENT_ID>
```

{% endcode %}

4. **Nova Finalizes Detach on Hypervisor:** After receiving confirmation, pf9-ostackhost ensures the disk is fully removed from the VM via the hypervisor (QEMU/KVM).

   If not already done earlier, this step ensures:

   * Device is no longer visible inside VM
   * Libvirt/QEMU mapping is removed

{% code overflow="wrap" %}

```
INFO nova.virt.block_device [[REQ_ID] [USER_NAME] [TENANT_NAME] Pod] [instance: [VM_UUID]] Attempting to driver detach volume [VOLUME_UUID] from mountpoint /dev/vdb
INFO nova.virt.libvirt.driver [[REQ_ID] [USER_NAME] [TENANT_NAME] Pod] Successfully detached device vdb from instance [VM_UUID] from the live domain config.
```

{% endcode %}

5. **Final Status Update:** Once the detachment is fully completed, `pf9-ostackhost` informs Cinder.&#x20;
   * Cinder updates:
     * Volume status → available
     * Attachment entry → removed

{% hint style="info" %}
Run `openstack volume show <volume-id>` to validate the status.
{% endhint %}

### Extending a Volume Flow

This process is primarily handled by Block Storage, with optional collaboration from Compute when the volume is attached.

1. **User Request (via Cinder):** A user requests to extend an existing volume. This request goes to the cinder-api service. Capture the `REQ_ID` from the `cinder-api` pod logs:

{% code overflow="wrap" %}

```
INFO cinder.api.openstack.wsgi [None [REQ_ID] [INTERNAL_REQ_ID] [TENANT_ID] [USER_ID] - - default default] POST https://[FQDN]/v3/[PROJECT_UUID]/volumes/[VOLUME_UUID]/action
INFO cinder.volume.api [None [REQ_ID] [INTERNAL_REQ_ID] [TENANT_ID] [USER_ID] - - default default] Extend volume request issued successfully.
cinder.api.openstack.wsgi [None [REQ_ID] [INTERNAL_REQ_ID] [TENANT_ID] [USER_ID] - - default default] https://[FQDN]/v3/[PROJECT_UUID]/volumes/[VOLUME_UUID]/action returned with HTTP 202
```

{% endcode %}

2. **Cinder Validates the Request:** The cinder-api validates if the:

   * New size is greater than the current size
   * Volume is in valid state (available or in-use if supported)

   Once validated:

   * Volume status → extending

{% hint style="info" %}
Run `openstack volume show <volume-id>` to validate the status. Also note the hosting Cinder node.
{% endhint %}

3. **Cinder Volume Service Processes Extend:** The `cinder-api` forwards the request to the `pf9-cindervolume-base` service on the cinder host where the volume is placed. Here, on the underlying Cinder hosts, the `/var/log/pf9/cindervolume-base.log` will show the volume getting resized.

{% code overflow="wrap" %}

```
INFO cinder.volume.drivers.nfs [[REQ-ID] None [TENANT_NAME] Pod] Extending volume [VOLUME_UUID].
INFO cinder.volume.drivers.nfs [[REQ-ID] None [TENANT_NAME] Pod] Resizing file to 60G...
INFO cinder.volume.manager [[REQ-ID] None [TENANT_NAME] Pod] Extend volume completed successfully.
```

{% endcode %}

## Procedure

{% hint style="info" %}
**Info**

Ensure that `openstack` and `cinder` binaries are present on the system.
{% endhint %}

1. Check if all cinder volume hosts are enable and running,

   ```
   $ openstack volume service list
   ```
2. List all volumes and grep for the affected volumes and get volume details like hosts information, status, errors using below command:

   ```
   $ openstack volume list | grep -i "<AFFECTED_VOLUME_NAME_OR_UUID>"
   $ openstack volume show <VOLUME_UUID>
   ```
3. The management plane has a **cinder-api** & **cinder-scheduler** pod to provide the volume service. Check if the a **cinder-api** & **cinder-scheduler** pods are running in the workload region namespace. Review all these pods::

{% hint style="info" %}
**Info**

Step 3 is applicable only for Self-Hosted Private Cloud Director
{% endhint %}

```
- Check if they are in "`CrashLoopBackOff/OOMkilled/Pending/Error/Init`" state. 
- Also, verify if all containers in the pods are Running.
- See the events section in pod describe output. 
- Review pods logs using `REQ_ID` or `VM_UUID` for relevant details.
```

```
$ kubectl get pods -o wide -n <WORKLOAD_REGION> | grep -i "cinder"

$ kubectl describe -n <WORKLOAD_REGION> <CINDER_API_POD>
$ kubectl describe -n <WORKLOAD_REGION> <CINDER_SCHEDULER_POD>

$ kubectl logs -n <WORKLOAD_REGION> <CINDER_API_POD>
$ kubectl logs -n <WORKLOAD_REGION> <CINDER_SCHEDULER_POD>
```

3. Once the underlying cinder host is identified review the `pf9-cindervolume-base` service status it should be up and running.

   ```
   $ sudo systemctl status pf9-cindervolume-base
   ```
4. Review the `/var/log/pf9/cindervolume-base.log` logs, check if there are any errors related to the Volume UUID.
5. If these steps prove insufficient to resolve the issue, kindly reach out to the [Platform9 Support Team](https://support.platform9.com/hc/en-us) for additional assistance.

### Most common causes

1. Volume Stuck in Creating / Deleting / Detaching State
2. Volume Attach Failure
3. Cinder Scheduler Can’t Place Volume
4. Incorrect storage backend configuration
