Windows GPU VM Stops Automatically After Deployment Due to Guest ACPI Standby Policy

Problem

A Windows virtual machine deployed with a vGPU profile stops automatically within minutes to hours of deployment or after a Windows Update reboot, with no user-initiated action. The VM transitions to a SHUTOFF state and the Compute Service reports the instance as stopped. The issue is intermittent in appearance — the VM is running immediately after deployment but stops silently once the guest becomes idle.

Environment

  • Private Cloud Director Virtualization - All versions

  • Self-Hosted Private Cloud Director Virtualization - All versions

  • Component: Compute Service, vGPU (NVIDIA L40S or equivalent SR-IOV vGPU profile)

  • Guest OS: Windows 11 (any build) or Windows Server

Cause

The Windows guest has ACPI S3 deep sleep (standby) enabled in its power plan. When the guest sits idle beyond the configured standby timeout — the Windows default is 15 minutes on AC power — the guest silently transitions to ACPI S3 sleep state.

The Compute Service periodically synchronises the guest power state with the hypervisor. When the hypervisor reports vm_power_state=7 (libvirt PMSUSPENDED), the platform interprets this as an unexpected suspension and automatically calls the instance stop API. The VM is then forcibly shut down by the hypervisor and transitions to SHUTOFF state in the platform.

This is expected and documented platform behaviour — the platform treats an unsolicited guest suspend as an anomaly and stops the instance to return it to a consistent state. The root cause is the Windows guest power policy, not a platform defect.

Diagnostics

1

Step 1 — Check the VM event log for unexpected stop events

Retrieve the event history for the stopped VM. Look for a stop action with no corresponding user-initiated request.

OpenStack CLI
$ openstack server event list <VM_UUID>
Sample Output
+------------------------------------------+--------------------------------------+--------+----------------------------+
| Request ID                               | Server ID                            | Action | Start Time                 |
+------------------------------------------+--------------------------------------+--------+----------------------------+
| [REQ_UUID]                               | [VM_UUID]                            | stop   | [TIMESTAMP]                |
| [REQ_UUID]                               | [VM_UUID]                            | start  | [TIMESTAMP]                |
+------------------------------------------+--------------------------------------+--------+----------------------------+

A stop action that appears without a preceding user-initiated action, shortly after deployment or after a Windows Update reboot, is consistent with this issue.

2

Step 2 — Confirm PMSUSPENDED state in the Hostagent log

On the compute host where the VM was running, search the Hostagent log for the PMSUSPENDED power state detection.

Compute Host
$ sudo grep -i "<VM_UUID>" /var/log/pf9/ostackhost.log | grep -iE "power_state|suspended|pmsuspend"
Sample Output
[TIMESTAMP] INFO nova.compute.manager [REQ_UUID] [VM_UUID] During _sync_instance_power_state the DB power_state (1) does not match the vm_power_state from the hypervisor (7). Updating power_state in the DB to match the hypervisor.
[TIMESTAMP] WARNING nova.compute.manager [REQ_UUID] [VM_UUID] Instance is suspended unexpectedly. Calling the stop API.

vm_power_state=7 is the libvirt PMSUSPENDED state. The warning Instance is suspended unexpectedly. Calling the stop API. confirms the platform stopped the VM in response to the guest entering ACPI sleep.

3

Step 3 — Verify the Windows guest power policy (inside the VM while running)

Start the VM and connect to the Windows guest before the standby timeout elapses. Run the following command in an elevated Command Prompt or PowerShell to confirm the current standby timeout values.

Inside VM
powercfg /query SCHEME_CURRENT SUB_SLEEP STANDBYIDLE
Sample Output
Power Scheme GUID: [SCHEME_UUID]  (Balanced)
  Power Setting GUID: [SETTING_UUID]  (Sleep after)
      Current AC Power Setting Index: 0x00000384
      Current DC Power Setting Index: 0x00000258

0x00000384 = 900 seconds = 15 minutes (Windows default AC timeout). 0x00000258 = 600 seconds = 10 minutes (Windows default DC timeout).

Any non-zero value means the guest will eventually enter standby and trigger the platform stop. A value of 0x00000000 means standby is disabled.

Also confirm whether S3 sleep state is available:

Inside VM
powercfg /a
Sample Output
The following sleep states are available on this system:
    Standby (S3)
    Hibernate

If Standby (S3) appears as available, the VM can enter ACPI sleep.

Workaround

Method 1 — Disable Standby on the Running VM (Immediate Fix)

Connect to the Windows guest and run the following commands in an elevated Command Prompt or PowerShell to disable all sleep-related timeouts immediately.

1

Step 1 — Disable standby and hibernate timeouts

2

Step 2 — Confirm the timeouts are now zero

Both AC and DC values must show 0x00000000. The VM will no longer enter standby and the platform will not stop the instance due to an unexpected suspension.

Method 2 — Enforce via Group Policy (Fleet-Wide Fix)

For environments where multiple GPU VMs are deployed from the same image or are domain-joined, enforce the sleep policy through Group Policy to prevent recurrence across the fleet.

Apply the following Group Policy setting on the domain controller or local policy editor:

Setting both values to 0 disables standby system-wide and survives reboots, Windows Updates, and power plan resets.

Method 3 — Bake into the Windows Golden Image (Permanent Prevention)

To prevent the issue on all future GPU VM deployments, disable standby in the Windows base image before deploying new VMs.

1

Step 1 — Boot the golden image VM and disable standby

Connect to the Windows source image VM and run the same commands as Method 1:

2

Step 2 — Verify the settings are applied

3

Step 3 — Snapshot or re-image from this golden image

Shut down the VM cleanly, create a new snapshot or volume image, and use this updated image as the base for all future GPU VM deployments. All VMs created from this image will have standby disabled from first boot.

Resolution

This is expected platform behaviour — no platform fix is required. The resolution is to disable ACPI standby in the Windows guest power policy using one or more of the methods above. Applying Method 3 (golden image update) is the permanent fix for environments deploying GPU VMs at scale.

Additional Information

Related articles:

Last updated