VM Build Failure Due to Nova Timeout

Problem

The VM build is failing, and the following errors are seen in ostackhost log:

INFO nova.compute.resource_tracker [[REQ_UUID] None None] Instance [VM_UUID] has allocations against this compute host but is not found in the database.
ERROR nova.compute.manager [[REQ_UUID] None None] Error updating resources for node [HOSTNAME]: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID [MSG_UUID] 

Environment

  • Self-Hosted Private Cloud Director Virtualization - v2025.10 and Higher

  • Component: Compute Service

Cause

The root cause is the RPC response timeout threshold being exceeded for communication between Nova Compute and Nova Conductor via RabbitMQ.

Optimizing the timeout value based on environmental factors can improve performance. In scenarios with slower networks or other delays, adjusting the Nova configuration helps ensure efficient request processing.

Workaround

Increase the following parameters in the Nova configuration to allow more time for RPC communication and service reporting:

Parameter
Default Value
Updated Value

rpc_response_timeout

120s

180s

report_interval

60s

120s

1. On the Management Cluster (Control Plane)

  • Update the nova-etc Kubernetes secret with the new values.

  • Restart the nova-api-osapi pods to pick up the changes.

  • Verify the updated values are reflected in:

2. On Each Compute Host

  • Update the following file with the new parameter values:

    Add or modify under the [DEFAULT] section:

  • Restart the pf9-ostackhost service:

Note: This change must be applied on all compute hosts.

Validation

After applying the changes to both the control plane and all compute hosts:

  1. Attempt to provision multiple VMs concurrently (burst provisioning).

  2. Monitor the ostackhost.log on the compute hosts for any recurrence of MessagingTimeout errors.

  3. Check RabbitMQ pod logs for a reduction in missed heartbeats from client errors.

  4. Confirm VMs are successfully built without entering ERROR or stuck BUILD state.

Last updated