Volume Stuck in "Error-Deleting" State due to ISCSI Errors
Problem
In the Openstack environments having ISCSI as a backend storage solution, the volumes deletion is stuck in "error deleting" state.
The volume status:
# openstack volume show <Volume_UUID> -f name -f id status -f os-vol-host-attr:host
+--------------------------------+--------------------------------------------------+
| Field | Value |
+--------------------------------+--------------------------------------------------+
| attachments | [] |
| id | [Volume-UUID] |
| name | [Volume_Name] |
| os-vol-host-attr:host | [Host-UUID]@lvm-backend |
| status | error_deleting |
| type | None |
+--------------------------------+--------------------------------------------------+
Diagnostics
In the cindervolume-base.log
within the location /var/log/pf9/
in the cinder host, below errors can be seen:
cinderhost:~$ zgrep -i [Volume-UUID] /var/log/pf9/cindervolume-base.log*
INFO cinder.volume.targets.tgt [Request_ID] None c114] Removing iscsi_target for Volume ID: [Volume-UUID]
ERROR cinder.volume.targets.tgt [[Request_ID] None c114] Failed to remove iscsi target for Volume ID: [Volume-UUID] : Unexpected error while running command.
tgt-admin --delete iqn.org.openstack:volume-[Volume-UUID] -f
ERROR oslo_messaging.rpc.server [Request_ID] None c114] Exception during message handling: cinder.exception.ISCSITargetRemoveFailed: Failed to remove iscsi target for volume [Volume-UUID].
TRACE oslo_messaging.rpc.server Command: tgt-admin --delete iqn.org.openstack:volume-[Volume-UUID] -f
TRACE oslo_messaging.rpc.server cinder.exception.ISCSITargetRemoveFailed: Failed to remove iscsi target for volume [Volume-UUID] .
The fdisk output will be having osprober results for the specific affected volume in the cinder host:
$ sudo fdisk -l | grep -i osprob | grep '<Volume-UUID>'
Disk /dev/mapper/osprober-linux-cinder--volumes-volume--[Volume-UUID1] : 10 GiB, 10737418240 bytes, 20971520 sectors
Environment
- Platform9 Managed OpenStack - v5.10 and Higher
Cause
When os-prober
is enabled, the grub-update
process utilizes grub-mount
to scan devices for operating systems and potentially adds them to the boot menu. However, in some cases, grub-update
fails to unmount these devices, leading to potential issues.
The grub-update
process is automatically triggered when a new kernel is installed, which can occur during routine or automatic security updates. If os-prober
remains enabled, it may cause unwanted behavior due to its mounting and unmounting process.
Solution
Steps the delete volumes in "error-deleting" status are:
- Update
GRUB_DISABLE_OS_PROBER=true
in the file/etc/default/grub
. - Perform grub update using the command:
update-grub
- Set the volume status detached in Cinder :
# cinder reset-state <volume ID> --state available --attach-status detached
- Now delete the volume using:
# cinder delete <volume ID>
Validation
Once the volume is deleted, the volume show ouput should mention the volume does not exist:
$ openstack volume show <Volume UUID>
No volume with a name or ID of '[Volume UUID]' exists.
Additional Information
From the error it indicates that the Cinder service is unable to delete the volume:
And, if the manual deletion of the ISCSI target is also failing, this is an issue specific to the underlying storage. It is recommended to fix the underlying storage-related issues with the ISCSI vendors to prevent recurring volume deletion failures from the Openstack side.
As a workaround to unblock the volume deletion within Openstack, please reach out to the Platform9 support team, where the affected volumes will be removed from the backend database manually by the Platform9 support team using below command.
$ mysql cinder -e "UPDATE volumes SET host = NULL where id='<Volume-UUID>';"
After the above change, the user should be able to delete the volume directly.