Multiple Instance Creation Failed Due to Max Scheduling Attempts Exceeded for Cinder Volume
Problem
During the creation of multiple VM instances, certain deployments are successful, but a subset experiences failures attributed to an error identified within the Cinder scheduler.
Failed to run task cinder.scheduler.flows.create_volume.ScheduleCreateVolumeTask;volume:create: No valid backend was found. Exceeded max scheduling attempts 3 for resource <VOLUME_UUID>: cinder.exception.NoValidBackend: No valid backend was found. Exceeded max scheduling attempts 3 for resource <VOLUME_UUID>
Environment
- Self-Hosted Private Cloud Director Virtualisation – v2025.4
- Private Cloud Director Virtualisation – v2025.4
Cause
The Cinder scheduler overloads due to multiple concurrent volume creation requests in the backend.
Resolution
The resolution for this issue is a two part process, which involves making changes on the Management Plane and on the hosts.
SaaS customers should reach out to Platform9 Support Team to implement Part-1 of the Resolution.
PART-1
- Cinder pods use
cinder.conf
fromcinder-etc
secret, so we need to update thecinder-etc
secret. Verify if secret is available in the namespace.
$ kubectl get secrets -n <REGION_NAMESPACE> | grep -i cinder-etc
- Take backup of the secret.
$ kubectl get secrets -n <REGION_NAMESPACE> cinder-etc -o yaml > cinder-etc.secret.bk
- Get the
cinder.conf
information from the secret
$ kubectl get secrets -n <REGION_NAMESPACE> cinder-etc -o json | jq -r '.data."cinder.conf"' | base64 -d > cinder.conf
- Open
cinder.conf
file in a file editor and make the below changes in default section. As shown below:
[default]
scheduler_max_attempts = 10
osapi_volume_workers = 8
service_down_time = 180
report_interval = 60
It is not recommended to increase the scheduler__max__attempts beyond 10. As there are multiple factors such as Storage backend network latency, Storage IOPS, etc.
- Save the
cinder.conf
and encode the file usingbase64
.
$ cat cinder.conf | base64 -w0
- Copy the encoded value from above command and edit the secret replace the older
cinder.conf
content with new encoded value.
$ kubectl edit secrets -n <REGION_NAMESPACE> cinder-etc
- Save the secret and verify if the new
cinder.conf
values are reflecting.
$ kubectl get secrets -n <REGION_NAMESPACE> cinder-etc -o json | jq -r '.data."cinder.conf"' | base64 -d
- Now restart
cinder-api
andcinder-scheduler
pod so that it will start utilising updatedcinder.conf
file.
$ kubectl get pods -n <REGION_NAMESPACE> | grep -i cinder
cinder-api-xxxxx-xxxxx 2/2 Running 0 10s
cinder-scheduler-xxxxxxx-xxxxx 1/1 Running 0 10s
PART-2
- On every host that has Persistent Storage role, open
/opt/pf9/etc/pf9-cindervolume-base/conf.d/cinder.conf
file in a file editor and make the below changes in default section as shown below:
[default]
scheduler_max_attempts = 10
osapi_volume_workers = 8
service_down_time = 180
report_interval = 60
- Restart the below service on every host that has Persistent Storage role:
$ systemctl restart pf9-cindervolume-base.service
The changes made will not persist through an upgrade. Therefore, it is important to reapply these steps immediately after the upgrade to ensure continued functionality.
Additional Information
Execute these commands carefully, ensuring no unintended characters are added to cinder.conf
. If the cinder.conf
file is corrupted, restore the cinder-etc secret from the backup file and restart the cinder-api and cinder-scheduler pods.