Multiple Instance Creation Failed Due to Max Scheduling Attempts Exceeded for Cinder Volume
Problem
During the creation of multiple VM instances, certain deployments are successful, but a subset experiences failures attributed to an error identified within the Cinder scheduler.
Failed to run task cinder.scheduler.flows.create_volume.ScheduleCreateVolumeTask;volume:create: No valid backend was found. Exceeded max scheduling attempts 3 for resource <VOLUME_UUID>: cinder.exception.NoValidBackend: No valid backend was found. Exceeded max scheduling attempts 3 for resource <VOLUME_UUID>Environment
- Self-Hosted Private Cloud Director Virtualisation – v2025.4
- Private Cloud Director Virtualisation – v2025.4
Cause
The Cinder scheduler overloads due to multiple concurrent volume creation requests in the backend. The changes are now part of the product starting in PCD June Release.
Workaround
The resolution for this issue is a two part process, which involves making changes on the Management Plane and on the hosts.
SaaS customers should reach out to Platform9 Support Team to implement Part-1 of the Resolution.
PART-1
- Cinder pods use
cinder.conffromcinder-etcsecret, so we need to update thecinder-etcsecret. Verify if secret is available in the namespace.
$ kubectl get secrets -n <REGION_NAMESPACE> | grep -i cinder-etc- Take backup of the secret.
$ kubectl get secrets -n <REGION_NAMESPACE> cinder-etc -o yaml > cinder-etc.secret.bk- Get the
cinder.confinformation from the secret
$ kubectl get secrets -n <REGION_NAMESPACE> cinder-etc -o json | jq -r '.data."cinder.conf"' | base64 -d > cinder.conf- Open
cinder.conffile in a file editor and make the below changes in default section. As shown below:
[default]scheduler_max_attempts = 10osapi_volume_workers = 8service_down_time = 180report_interval = 60It is not recommended to increase the scheduler__max__attempts beyond 10. As there are multiple factors such as Storage backend network latency, Storage IOPS, etc.
- Save the
cinder.confand encode the file usingbase64.
$ cat cinder.conf | base64 -w0- Copy the encoded value from above command and edit the secret replace the older
cinder.confcontent with new encoded value.
$ kubectl edit secrets -n <REGION_NAMESPACE> cinder-etc- Save the secret and verify if the new
cinder.confvalues are reflecting.
$ kubectl get secrets -n <REGION_NAMESPACE> cinder-etc -o json | jq -r '.data."cinder.conf"' | base64 -d- Now restart
cinder-apiandcinder-schedulerpod so that it will start utilising updatedcinder.conffile.
$ kubectl get pods -n <REGION_NAMESPACE> | grep -i cindercinder-api-xxxxx-xxxxx 2/2 Running 0 10scinder-scheduler-xxxxxxx-xxxxx 1/1 Running 0 10sPART-2
- On every host that has Persistent Storage role, open
/opt/pf9/etc/pf9-cindervolume-base/conf.d/cinder.conffile in a file editor and make the below changes in default section as shown below:
[default]scheduler_max_attempts = 10osapi_volume_workers = 8service_down_time = 180report_interval = 60- Restart the below service on every host that has Persistent Storage role:
$ systemctl restart pf9-cindervolume-base.serviceThe changes made will not persist through an upgrade. Therefore, it is important to reapply these steps immediately after the upgrade to ensure continued functionality.
Additional Information
Execute these commands carefully, ensuring no unintended characters are added to cinder.conf. If the cinder.conf file is corrupted, restore the cinder-etc secret from the backup file and restart the cinder-api and cinder-scheduler pods.