Troubleshooting Heat Stack Issues

Problem

This guide provides step-by-step instructions for troubleshooting and resolving stack issues in Private Cloud Director.

Environment

Private Cloud Director Virtualization – v2025.4 and Higher
Self-Hosted Private Cloud Director Virtualization – v2025.4 and Higher

Procedure

When troubleshooting stack issues, follow these steps:

1. Identify the Stack Status

command
    
 
$ openstack stack list
Copy

Look for statuses like CREATE_IN_PROGRESS, CREATE_FAILED, or ROLLBACK_IN_PROGRESS.

2. Get Stack Information

command
    
 
$ openstack stack show <stack_name or id>
Copy

Review stack parameters, outputs, and overall status.

Parameters: Ensure required inputs like image name, flavor, or network ID are correct and exist.

Outputs: Confirm expected outputs (like IP addresses or resource IDs) are present. Missing outputs may indicate failed resource creation.

3. Check Stack Events for Failures

command
    
 
$ openstack stack event list <stack_name or id>
Copy

4. Inspect Individual Resource Status

Identify which resource caused the failure. Find out if any resource is stuck in CREATE_IN_PROGRESS or CREATE_FAILED.

command
    
 
$ openstack stack resource list <stack_name>$ openstack stack resource show <stack_name> <failed_resource_name>
Copy

Example
    
​x
 
$ openstack stack resource list <stack_name>+----------------+------------------+------------------+---------------------+| resource_name  | resource_type    | resource_status  | updated_time        |+----------------+------------------+------------------+---------------------+| my_instance    | OS::Nova::Server | CREATE_FAILED    |  [TIMESTAMP]        || my_network     | OS::Neutron::Net | CREATE_COMPLETE  |  [TIMESTAMP]        |+----------------+------------------+------------------+---------------------+​$ openstack stack resource show <stack_name> my_instanceattributes: nullcreation_time: '[TIMESTAMP]'logical_resource_id: my_instancephysical_resource_id: [RESOURCE_ID]resource_action: CREATEresource_name: my_instanceresource_status: CREATE_FAILEDresource_status_reason: >  Resource creation failed: Quota exceeded for cores: Requested 4, but available 2.resource_type: OS::Nova::Serverrequired_by:- my_instance_floating_ipupdated_time: '[TIMESTAMP]'
Copy

Check where did the Stack Failed. Identify the exact resource(s) and reason(s) for failure during stack creation.

command
    
 
$ openstack stack failures list <stack_id>​//Sample outputResource: my_instance  Status: CREATE_FAILED  Reason: Image <ubuntu-20.04> could not be found.
Copy

6. Check Heat Component Pod Status and Logs

This step is applicable only for self-hosted PCD environments.

Check if the Heat components are running:

command
    
 
$ kubectl get pods -n <workload-region> | grep heatheat-api-xxxxxxxxxx-xxxxx      1/1     Running     0     <age>heat-cfn-xxxxxxxxxx-xxxxx      1/1     Running     0     <age>heat-engine-xxxxxxxxxx-xxxxx   1/1     Running     0     <age>
Copy

To check for errors related to the stack or resource ID in the logs, run:

command
    
 
$ kubectl logs -n <workload-region>  <heat-engine-pod-name> | grep -i <stack-id or resource-id>$ kubectl logs -n <workload-region>  <heat-api-pod-name> | grep -i <stack-id or resource-id>
Copy

Look for stack tracebacks or API-related errors.

7. Validate Stack Template

command
    
 
$ openstack orchestration template validate -f <template_file.yaml>
Copy

Ensure the template syntax is correct before deployment.

8. Check Quotas

command
    
 
$ openstack quota show <project_id>
Copy

Verify if quotas are causing resource creation failures.

Check both compute and network quotas for the project, and compare them against the requested values in the stack.

key quotas to check:

Compute:
- vCPUs: Requested vCPUs ≤ Available vCPUs
- RAM: Requested RAM ≤ Available RAM
- Instances: Total number of VMs within allowed limit
Network:
- Ports: Requested number of ports ≤ Available quota
- Security Groups: Total security groups ≤ quota
- Floating IPs: Requested number ≤ quota

9. Confirm Resource Availability

command
    
 
$ openstack image list$ openstack flavor list
Copy

Ensure the referenced images and flavors exist.

10. Check Network Connectivity

command
    
 
$ openstack network agent list$ openstack network show <network_id>
Copy

Ensure network components are operational and properly configured.

11. Mark Failed Resource as Unhealthy and Attempt Stack Update

✅ Update the stack only if the issue is small like a typo or a missing value and everything else in the stack is working fine.

❌ Avoid updating if the stack has critical resource failures or dependencies that may cause cascading issues.

If applicable, update the stack with corrected parameters. If the issue persists, consider deleting and redeploying the stack.

command
    
 
$ openstack stack resource mark unhealthy <stack_name> <resource_name>$ openstack stack update --existing --template <template_file.yaml> <stack_name>
Copy

If these steps do not resolve the issue, please contact the Platform9 Support Team for further assistance.

Most common causes:

Template syntax errors (YAML/JSON issues, missing parameters)
Resource conflicts (duplicate names, unavailable images)
Quota limits exceeded (compute, network, or storage)
Networking issues (missing subnets, no floating IPs)
Heat engine service failures
API rate limits exceeded
Delays in dependent resource creation
Authentication failures (expired tokens, invalid credentials)

Last updated on

Was this page helpful?