VM Network Port Attachment Failure and High CPU Usage on OVNDB

Problem

PCD environment is experiencing a critical failure in VM network port attachment operations, preventing virtual machines from connecting to the network. The service is consuming 88-95% CPU, causing it to become unavailable for legitimate API requests.

OVN Logs
|poll_loop|INFO|Dropped 79 log messages in last 94 seconds (most recently, 91 seconds ago) due to excessive rate
|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/jsonrpc-server.c:616 (95% CPU usage)
ostackhost Logs
nova.exception.PortInUse: Port <PORT_UUID> is still in use.

Environment

  • Private Cloud Director Virtualization - v2025.4 and Higher

  • Private Cloud Director Kubernetes – v2025.4 and Higher

  • Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher

  • Self-Hosted Private Cloud Director Kubernetes - v2025.4 and Higher

Cause

The OVN Database (OVN-DB) service has entered a non-responsive state characterised by a high-CPU spin loop.

Diagnostics

circle-info

Step 2 -3 only accessible and applicable for the Self-Hosted Private Cloud Director.

For SaaS environments, please reach out to Platform9 support.

  1. VM deployment fails with the below exception shown on the pf9-ostackhost logs located on the hypervisor at location /var/log/pf9/.

  2. Review the ovn-ovsdb-sb-0 pod logs from the workload region namespace.

  3. Validate the above pod logs and check CPU usage messages like those shown in the below snippet.

Resolution

circle-info

Only accessible and applicable for the Self-Hosted Private Cloud Director.

For SaaS environments, please reach out to Platform9 support.

  1. Take the backup of OVNDB database.

  2. Once backup is taken, execute compaction command

  3. Enable memory compaction on the below pods

  4. Restart the below pods

Validation

circle-info

Step 1-2 is applicable for the Self-Hosted Private Cloud Director.

For SaaS environments, please reach out to Platform9 support.

  1. Check the pods' status; it should be running

  2. Review the ovn-ovsdb pod logs from the workload region namespace and confirm if no high CPU usage messages are logged in these pods.

  3. Deploy a new VM and review the pf9-ostackhost logs from the hypervisor (host); it no longer logs Port <PORT_UUID> is still in use messages.

Last updated