Hostagent Intermittently Fails to Converge with Virsh Domcapabilities Failures

Problem

After performing a host upgrade or re-onboarding (decommission + prep-node) of a compute host in Platform9 PCD, the hostagent intermittently fails to converge. The host may appear Online in the PCD UI, but the hostagent log continuously reports errors related to virsh domcapabilities and fetch_gpu_info.py.

Environment

  • Private Cloud Director Virtualization — v2025.10-180

  • Self-Hosted Private Cloud Director Virtualization — v2025.10-180

  • Component: Hostagent, GPU.

Diagnostics

  • Check for virsh domcapabilities failure and converging failures in hostagent.log

    /var/log/pf9/hostagent.log
    sysinfo.py ERROR - Failed to get supported CPU models:
      Command '['virsh', 'domcapabilities']' returned non-zero exit status 1.
    Traceback (most recent call last):
      File ".../bbslave/sysinfo.py", in get_supported_cpu_models
        result = subprocess.run(['virsh', 'domcapabilities'],
    subprocess.CalledProcessError: Command '['virsh', 'domcapabilities']'
      returned non-zero exit status 1.
  • Check for fetch_gpu_info.py extension failure

    /var/log/pf9/hostagent.log
    session.py ERROR - /opt/pf9/hostagent/extensions/fetch_gpu_info.py command failed:
      Command '['/opt/pf9/hostagent/extensions/fetch_gpu_info.py']'
      returned non-zero exit status 1.

Affected Scenarios:

This issue has been observed in the following scenarios:

  • Host upgrade (in-place package upgrade of pf9-ostackhost)

  • Host re-onboarding via pcdctl decommission-node -f followed by pcdctl prep-node

Cause

The /opt/pf9/home directory is incorrectly owned by root:root instead of pf9:pf9group after a host upgrade or re-onboarding.

The hostagent runs virsh domcapabilities under the pf9 user context. When /opt/pf9/home is owned by root, the pf9 user cannot access its home directory, causing virsh to fail with a non-zero exit code.

The root cause is in the Platform9 package installation script /opt/pf9/hostagent/bin/pf9-apt. The package manager runs as root and extracts package files with root ownership. When the pf9-ostackhost package is installed or upgraded, /opt/pf9/home is extracted/created with root:root ownership instead of pf9:pf9group.

This is confirmed by the directory listing post-upgrade:

Resolution

This is a known issue identified in PCD v2025.10-180, and the fix is to ensure the package installation/upgrade process correctly sets pf9:pf9group ownership to /opt/pf9/home during package extraction is now incorporated into PCD v2026.1-260 and above versions.

Workaround

If upgrading immediately is not possible, apply the following immediate workaround on each affected compute host.

1

Fix directory ownership

2

Verify the ownership fix

3

Verify virsh works under the pf9 user

Expected: XML output returned without errors.

4

Check hostagent converge recovers automatically

Expected: --- Converging --- followed by no virsh or fetch_gpu_info errors.

In most cases, no service restart is required. The hostagent will automatically retry to converge and succeed once the ownership is corrected.

5

(If needed) Restart hostagent manually

If the hostagent does not recover automatically within 2–3 minutes:

Validation

After applying the workaround or upgrade, validate in the following order:

1

Confirm correct ownership

2

Confirm virsh works under the pf9 user

3

Confirm hostagent is converging cleanly

Expected: --- Converging --- with no virsh or fetch_gpu_info errors

4

Confirm the host is Online in the PCD UI

Navigate to: Infrastructure → Hosts

Expected: Host status = Online / Applied

Additional Information

  • GPU Hosts: This issue has a higher impact on GPU compute hosts. In addition to the virsh domcapabilities failure, the fetch_gpu_info.py extension also fails, which can prevent GPU resource reporting to the hostagent and affect GPU VM scheduling. Apply the ownership fix on all GPU hosts after any upgrade or re-onboarding.

  • Re-onboarding: If a host was decommissioned (pcdctl decommission-node -f) and re-onboarded (pcdctl prep-node), the ownership issue may reoccur. Always verify /opt/pf9/home ownership after re-onboarding and apply the chown fix if needed.

Last updated