# DRR Not Triggering VM Migration Under Host Memory Pressure Leading to OOM-Induced VM Crashes

## Problem

* Multiple VMs are unexpectedly transitioning to **SHUTOFF** state after crashing at the hypervisor level due to host memory pressure (OOM events).

{% code title="Kernel Logs" %}

```log
Out of memory: Killed process (qemu-system-x86)
CPU x/KVM invoked oom-killer
```

{% endcode %}

* Despite available memory capacity on other hosts in the cluster, **Dynamic Resource Rebalancing (DRR)** is not triggering live migrations. As a result, memory pressure continues to build on overloaded hosts, and the Linux kernel OOM killer terminates QEMU processes, leading to VM crashes.

{% code title="Watcher Logs" %}

```log
Node <uuid> overloaded, attempting to reduce load
No destination hosts suggested by nova scheduler
```

{% endcode %}

## Environment

* Private Cloud Director Virtualization - v2025.10 and Higher
* Self-Hosted Private Cloud Director Virtualization - v2025.10 and Higher

## Cause

DRR strategy detects overloaded hosts and attempts to rebalance workloads. However, during migration planning, the **Nova scheduler does not return any valid destination hosts**, even when sufficient capacity is available within the cluster.

This results in:

* No live migrations being triggered
* Continued memory pressure on affected hosts
* Kernel OOM killer terminating `qemu-system-x86` processes
* VMs crashing and transitioning to **SHUTOFF** state

This indicates a scheduler/placement decision issue rather than actual resource exhaustion.

## Workaround

* Kill the **watcher decision engine pod**. Once restarted, it re-establishes the connection to RabbitMQ and DRR starts functioning again.

```bash
kubectl get pods -n $NS | grep -i decision
kubectl delete pod watcher-decision-engine-6fbd96b9b4-5l76d -n $NS
```

Once the pod restarts, DRR starts working again and migrations may resume.

{% hint style="warning" %}
This is a temporary workaround. The issue may reoccur, especially if RabbitMQ restarts.
{% endhint %}

## Additional Information

This issue is currently being tracked by the engineering team under PCD-1854, and a fix is planned to be included in the April 2026 release.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://platform9.com/kb/pcd/generic/drr-not-triggering-vm-migration-under-host-memory-pressure-leading-to-oom-induced-vm-crashes.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.