> For the complete documentation index, see [llms.txt](https://platform9.com/kb/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://platform9.com/kb/pcd/gpu/gpu-vm-cold-migration-resize-fails-insufficient-compute-resources.md).

# GPU VM Cold Migration / Resize Fails: Insufficient Compute Resources

## Problem

A GPU Passthrough VM transitions to the ERROR state during cold migration or resize. The operation fails even when the destination host has sufficient physical GPU resources available.

## Environment

* Self-Hosted Private Cloud Director Virtualization — v2025.10-180 and above
* Private Cloud Director Virtualization — v2025.10-180 and above
* Components: GPU and Compute Service

## Diagnostics

* Check the PCI configuration, and verify the GPU passthrough device (`type-PF`). <br>

  <pre class="language-ruby" data-title="/opt/pf9/etc/nova/nova.conf" data-overflow="wrap"><code class="lang-ruby">[pci]
  alias = {"vendor_id": "[VENDOR_ID]", "product_id": "[PRODUCT_ID]", "device_type": "type-PF","name": "[DEVICE_NAME]", "live_migratable": "yes"}  
  </code></pre>
* Check `ostackhost.log` on the destination compute host for the following log entries:<br>

  <pre class="language-ruby" data-title="/var/log/pf9/ostackhost.log" data-overflow="wrap"><code class="lang-ruby">INFO  nova.compute.claims   [instance: [INSTANCE_UUID]] Failed to claim: Claim pci failed
  <strong>ERROR nova.compute.manager  [instance: [INSTANCE_UUID]] Setting instance vm_state to ERROR: nova.exception.ComputeResourcesUnavailable: Insufficient compute resources: Claim pci failed.
  </strong></code></pre>
* [Enable debug logging](https://platform9.com/kb/pcd/compute/how-to-enable-debug-logging#compute) and re-check `ostackhost.log`. With debug logging enabled, look for:<br>

  <pre class="language-ruby" data-title="/var/log/pf9/ostackhost.log" data-overflow="wrap"><code class="lang-ruby">DEBUG nova.pci.stats  PCI claim: Starting with 1 devices, request count: 1
  <strong>DEBUG nova.pci.stats  PCI claim: Request spec: [{'vendor_id': '[VENDOR_ID]', 'product_id': '[PRODUCT_ID]','dev_type': 'type-PF', 'live_migratable': 'true'}]
  </strong>DEBUG nova.pci.utils  PCI attribute check: key=live_migratable, spec_value=true, device_value=None
  DEBUG nova.pci.utils  PCI attribute mismatch: live_migratable spec="true" != device="None"
  DEBUG nova.pci.stats  Dropped 1 device(s) due to mismatched PCI attribute(s)
  DEBUG nova.pci.stats  Not enough PCI devices left to satisfy request after spec filtering
  <strong>DEBUG nova.pci.stats  PCI claim: Available: 0, Required: 1
  </strong></code></pre>

**Key Indicator:** The `live_migratable` attribute is present in the PCI request spec (`true`) but is absent (`None`) from the available device pool — causing Nova to drop all matching devices and fail the claim.

## Cause

This is a regression introduced in **v2025.10-180**, where the `live_migratable=yes` flag was introduced to the PCI alias configuration (`nova.conf`) for GPU passthrough devices (`type-PF`). This flag is only valid for `type-VF` (vGPU / SR-IOV Virtual Function) devices that support live migration via kernel variant drivers.

For `type-PF` (PCI Passthrough / Physical Function) devices, live migration is NOT supported. When `live_migratable=yes` is set in the alias, Nova's PCI filter includes this attribute in the request spec. Since the physical device pool does not carry the `live_migratable` attribute, the comparison fails, and all available GPU devices are dropped — resulting in `Available: 0`.

**Affected Configuration**

The problematic configuration is present in `/opt/pf9/etc/nova/nova.conf` on GPU compute hosts in **v2025.10-180**:

{% code title="/opt/pf9/etc/nova/nova.conf " %}

```ini
[pci]
alias = {"vendor_id": "[VENDOR_ID]", "product_id": "[PRODUCT_ID]", "device_type": "type-PF","name": "[DEVICE_NAME]", "live_migratable": "yes"}   ← INCORRECT
```

{% endcode %}

**Attribute Mismatch Flow**

During the issue, the below configuration is fed.

```ruby
{'vendor_id': '[VENDOR_ID]', 'product_id': '[PRODUCT_ID]', 'dev_type': 'type-PF', 'live_migratable': 'true'}
```

However, the below configurations are only supported in the device pool configurations:

```ruby
{'vendor_id': '[VENDOR_ID]', 'product_id': '[PRODUCT_ID]', 'dev_type': 'type-PF'}
# 'live_migratable' key is ABSENT
```

As a result:

```ruby
PCI attribute mismatch: live_migratable spec="true" != device="None" → Device dropped → Claim pci failed
```

## Resolution

* Live migration for GPU passthrough (`type-PF`) is not supported yet; `live_migratable` is reserved for the future `type-VF` (vGPU) support and is tracked in **PCD-6249**.&#x20;

## Workaround

{% stepper %}
{% step %}
Remove `device_spec` from `nova.conf` under \[pci] section

Only remove the `device_spec` line — do not delete the entire `[pci]` section.

{% code title="GPU Host" %}

```bash
# Backup first
cp /opt/pf9/etc/nova/nova.conf /opt/pf9/etc/nova/nova.conf.bak

# Edit nova.conf and remove only the device_spec line under [pci]:
# device_spec = {"vendor_id": "<VENDOR_ID>", "product_id": "<PRODUCT_ID>"}
vi /opt/pf9/etc/nova/nova.conf
```

{% endcode %}
{% endstep %}

{% step %}
Add the corrected PCI config to `nova-override.conf` (without live\_migratable)

{% code title="GPU Host" %}

```bash
# Backup first
cp /opt/pf9/etc/nova/nova-override.conf /opt/pf9/etc/nova/nova-override.conf.bak

# Append correct [pci] section
cat >> /opt/pf9/etc/nova/nova-override.conf << 'EOF'
[pci]
device_spec = {"vendor_id": "<VENDOR_ID>", "product_id": "<PRODUCT_ID>"}
alias = {"vendor_id": "<VENDOR_ID>", "product_id": "<PRODUCT_ID>", "device_type": "type-PF", "name": "<DEVICE_NAME>"}
report_in_placement = True
EOF
```

{% endcode %}

{% hint style="info" %}
Do NOT include `live_migratable` in the alias.
{% endhint %}
{% endstep %}

{% step %}
Remove any existing `pci.conf` from `conf.d` - if present.

{% code title="GPU Host" %}

```bash
if [ -f /opt/pf9/etc/nova/conf.d/pci.conf ]; then
    mv /opt/pf9/etc/nova/conf.d/pci.conf \
       /opt/pf9/etc/nova/conf.d/pci.conf.bak
fi
```

{% endcode %}
{% endstep %}

{% step %}
Restart the Compute Service

{% code title="GPU Host" %}

```bash
sudo systemctl restart pf9-ostackhost
sudo systemctl status pf9-ostackhost
```

{% endcode %}
{% endstep %}

{% step %}
Verify the Hostagent convergence state.

{% code title="GPU Host" %}

```bash
systemctl status pf9-hostagent
tail -100 /var/log/pf9/hostagent.log | grep -i "converge\|error\|fail"
```

{% endcode %}
{% endstep %}

{% step %}
Validate cold migration and resize

After the service restarts successfully, test the following:

* Cold migration of a GPU VM between hosts
* GPU flavor resize (e.g., 8 GPU → 4 GPU)
  {% endstep %}
  {% endstepper %}

## Validation

To confirm the fix is applied, [Enable debug logging](https://platform9.com/kb/pcd/compute/how-to-enable-debug-logging#compute) and re-check `ostackhost.log` to verify the PCI claim no longer includes `live_migratable` in the request spec. A successful claim should show

{% code title="/var/log/pf9/ostackhost.log" %}

```ruby
DEBUG nova.pci.stats  PCI claim: Request spec: [{'vendor_id': '[VENDOR_ID]', 'product_id': '[PRODUCT_ID]','dev_type': 'type-PF'}]
# No 'live_migratable' key present
DEBUG nova.pci.stats  PCI claim: Available: 1, Required: 1
```

{% endcode %}

Compare with the failure pattern in the [#diagnostics](#diagnostics "mention")section.

## Additional Information

* This issue affects all GPU PCI passthrough configurations using `type-PF` in v2025.10-180 — not just NVIDIA H200.

<br>

***


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://platform9.com/kb/pcd/gpu/gpu-vm-cold-migration-resize-fails-insufficient-compute-resources.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
