Troubleshoot VM-to-VM (Same Network) connectivity failures
Problem
This guide provides instructions for troubleshooting network connectivity failures between two Virtual Machines (VMs) residing on the same logical network (subnet). In an OVN-backed environment, troubleshooting differs significantly depending on whether the two VMs are running on the same physical compute node or different compute nodes.
Environment
Private Cloud Director Virtualization - v2025.4 and Higher
Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher
Component - Networking Service
Deep Dive: Architecture & Packet Flow
To troubleshoot OVN effectively, you must understand the distinction between the "Brain," the "Muscle," and the "Wire," as well as exactly how packets traverse them.
OVN (The Brain): Runs in the Management Plane. It translates your intent (Logical Switches, Security Groups) into raw instructions.
Northbound DB (
ovn-ovsdb-nb-0): Stores intent (Ports, ACLs).Southbound DB (
ovn-ovsdb-sb-0): Stores reality (Chassis bindings, MAC locations).
OVS (The Muscle): Runs on the compute node (
ovs-vswitchd). It executes the actual forwarding of packets based on "Flow Rules" across the integration bridge (br-int).Geneve (The Wire): The UDP tunnel (Port 6081) that encapsulates VM packets for cross-host transport.
How the Packet Flows (Same Network)
Depending on where the VMs live, the packet takes one of two distinct paths:
Path A: Same Host (Intra-Host)
Source VM→Source Tap→br-int (Source Node)→OVS Connection Tracking (Security Groups)→br-int (Source Node)→Destination Tap→Destination VM(Note: Traffic never touches the physical network.)
Path B: Different Hosts (Inter-Host)
Source VM→Source Tap→br-int (Source Node)→Geneve Encapsulation→Physical NIC→Physical Network (UDP 6081)→Physical NIC→Geneve Decapsulation→br-int (Destination Node)→Destination Tap→Destination VM
Prerequisites: Executing OVN Commands
Refer to the existing guide for SaaS Alias setup or Self-Hosted kubectl exec instructions to run ovn-sbctl and ovn-nbctl commands.
Procedure
1. Variable Discovery & Pre-Flight Checks
You cannot troubleshoot flows without exact IDs. Gather these from the Management Plane before tracing.
Analysis: If step 4 returns an empty chassis for either VM, the port is unbound and cannot send/receive traffic.
Logs to Check: On the expected hypervisor, check
/var/log/pf9/ovn/ovn-controller.log. Look forclaim failedorunrecognized portindicating theovn-controlleris refusing to bind the VM to the host.
2. Identify the Tap Interface (Data Plane)
Log into the Compute Node hosting the Source VM to find the exact interface name dynamically.
Command (Executed on Source Compute Node):
Command (Executed on Destination Compute Node):
3. Capture at the Source (The Tap)
Goal: Prove the packet is actually leaving the VM. Command (Executed on Source Compute Node):
Success: Packets seen (e.g.,
IP <SOURCE_IP> > <DEST_IP>: ICMP echo request). The Guest OS is fine. Proceed to Step 4.Failure: No packets seen. The issue is inside the VM's Guest OS (e.g., internal firewall or interface down).
4. Physical Datapath Trace (ofproto/trace)
Goal: Ask OVS "Why are you dropping this packet?" based on its programmed flow rules. Command (Executed on Source Compute Node):
Analysis: Look at the Datapath actions: at the bottom of the output.
Output:
Action: output:<OFPORT_B>(Success! Delivered locally to the destination tap).Output:
Action: set_tunnel:0x<VNI>, output:<TUNNEL_PORT>(Success! Sent to the Geneve tunnel for a remote host).Drop:
Action: drop. Note thecookie=0x...value in the trace output and proceed to Step 5.Logs to Check: If OVS is dropping traffic that OpenStack says should pass, check
/var/log/ovn/ovn-controller.logfor OpenFlow programming errors (ofctrl_puterrors).
5. Cookie Decoding (If Trace Dropped)
Goal: Translate the physical OVS drop back into an OVN logical rule. Command (Executed from Management Plane):
Analysis Matrix (Layer 2 Drops): *
ls_out_acl/ls_in_acl(Security Group Drop): A Neutron Security Group is explicitly denying the traffic.ls_in_port_sec_l2/ls_in_port_sec_ip(Anti-Spoofing Drop): The Source VM is trying to transmit using a MAC or IP address that does not legitimately belong to its port (e.g., nested virtualization or unapproved static IPs).
6. Sniff the Tunnels (For Inter-Host Only)
If the VMs are on different hosts and Step 4 said output to tunnel, verify packets are physically crossing the wire. Command (Executed on Source & Destination Compute Nodes):
Command (Executed on Source & Destination Compute Nodes):
Success: Packets leave Source Node and arrive at Destination Node.
Failure: Packets leave Source Node but do not arrive. A physical firewall or switch ACL is blocking UDP 6081.
7. Capture at the Destination (The Tap)
Goal: Prove the OpenStack network successfully delivered the packet to the Destination VM's doorstep. Command (Executed on Destination Compute Node):
Success: Packets are seen arriving at the tap. If pings still fail, OpenStack networking is perfect; the Destination VM's Guest OS firewall (iptables/Windows Defender) is dropping the traffic.
Failure: Traces passed and tunnels look good, but packets don't hit the destination tap. Proceed to Step 8.
8. Clear Stale FDB Entries (Ghost Traffic)
Goal: If either the Source or Destination VM was recently migrated, OVN may be tunneling traffic to the wrong compute node based on a stale Forwarding Database (FDB) entry. Because traffic is bidirectional, a stale entry for either MAC will break communication (forward or return path). Command (Executed from Management Plane):
Logs to Check: On both compute nodes, run
tail -f /var/log/ovn/ovn-controller.log | grep pinctrl. This tracks the MAC learning process. Ifpinctrlis not seeing the Gratuitous ARP (GARP) from the VM, the FDB will not update.
9. The Physical Killers (MTU & Offloads)
If traces pass and the tunnel shows traffic, but pings/SSH still fail, the Geneve encapsulation is failing physically. Command (Executed on Compute Nodes):
Analysis: If MTU is exactly 1500 on the physical NIC, Geneve packets will exceed the MTU and drop silently during inter-host transit.
Logs to Check: Run
dmesg -T | grep -i ethor check/var/log/syslogfor hardware-level drops or NIC driver errors. Check/var/log/openvswitch/ovs-vswitchd.logforunreasonably large packetfragmentation warnings.
Most common causes
Missing Security Group Rules: The destination VM's Security Group does not explicitly allow the inbound protocol (Ingress).
Guest OS Firewall:
iptables,ufw, or Windows Defender inside the destination VM is dropping the traffic even though the network delivered it.Stale FDB (Ghost Traffic): OVN is sending traffic to the wrong node after a VM migration. This can cause the initial request to drop (Dest migration) or the return reply to drop (Source migration).
Physical MTU Mismatch: The physical network does not account for Geneve encapsulation overhead (~58 bytes), causing packets to drop silently during inter-host transit.
Hardware Offload Corruption: Physical NIC hardware offloading features are corrupting the Geneve tunnel headers.
Last updated
