Troubleshooting Heat Stack Issues

Problem

This guide provides step-by-step instructions for troubleshooting and resolving stack issues in Private Cloud Director.

Environment

  • Private Cloud Director Virtualization – v2025.4 and Higher
  • Self-Hosted Private Cloud Director Virtualization – v2025.4 and Higher

Procedure

When troubleshooting stack issues, follow these steps:

1. Identify the Stack Status

command
Copy

Look for statuses like CREATE_IN_PROGRESS, CREATE_FAILED, or ROLLBACK_IN_PROGRESS.

2. Get Stack Information

command
Copy

Review stack parameters, outputs, and overall status.

Parameters: Ensure required inputs like image name, flavor, or network ID are correct and exist.

Outputs: Confirm expected outputs (like IP addresses or resource IDs) are present. Missing outputs may indicate failed resource creation.

3. Check Stack Events for Failures

command
Copy

4. Inspect Individual Resource Status

Identify which resource caused the failure. Find out if any resource is stuck in CREATE_IN_PROGRESS or CREATE_FAILED.

command
Copy
Example
Copy
  1. Check where did the Stack Failed. Identify the exact resource(s) and reason(s) for failure during stack creation.
command
Copy

6. Check Heat Component Pod Status and Logs

This step is applicable only for self-hosted PCD environments.

Check if the Heat components are running:

command
Copy

To check for errors related to the stack or resource ID in the logs, run:

command
Copy

Look for stack tracebacks or API-related errors.

7. Validate Stack Template

command
Copy

Ensure the template syntax is correct before deployment.

8. Check Quotas

command
Copy

Verify if quotas are causing resource creation failures.

Check both compute and network quotas for the project, and compare them against the requested values in the stack.

key quotas to check:

  • Compute:

    • vCPUs: Requested vCPUs ≤ Available vCPUs
    • RAM: Requested RAM ≤ Available RAM
    • Instances: Total number of VMs within allowed limit
  • Network:

    • Ports: Requested number of ports ≤ Available quota
    • Security Groups: Total security groups ≤ quota
    • Floating IPs: Requested number ≤ quota

9. Confirm Resource Availability

command
Copy

Ensure the referenced images and flavors exist.

10. Check Network Connectivity

command
Copy

Ensure network components are operational and properly configured.

11. Mark Failed Resource as Unhealthy and Attempt Stack Update

✅ Update the stack only if the issue is small like a typo or a missing value and everything else in the stack is working fine.

❌ Avoid updating if the stack has critical resource failures or dependencies that may cause cascading issues.

If applicable, update the stack with corrected parameters. If the issue persists, consider deleting and redeploying the stack.

command
Copy

If these steps do not resolve the issue, please contact the Platform9 Support Team for further assistance.

Most common causes:

  • Template syntax errors (YAML/JSON issues, missing parameters)
  • Resource conflicts (duplicate names, unavailable images)
  • Quota limits exceeded (compute, network, or storage)
  • Networking issues (missing subnets, no floating IPs)
  • Heat engine service failures
  • API rate limits exceeded
  • Delays in dependent resource creation
  • Authentication failures (expired tokens, invalid credentials)
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard