Troubleshoot VM-to-VM (Different Networks) connectivity failures
Problem
This guide provides instructions for troubleshooting network connectivity failures between two Virtual Machines (VMs) residing on different logical networks or subnets . Traffic must cross a Neutron Logical Router.
Environment
Private Cloud Director Virtualization - v2025.4 and Higher
Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher
Component - Networking Service
Deep Dive: Architecture & Packet Flow
To troubleshoot OVN effectively, you must understand the distinction between the "Brain," the "Muscle," and the "Wire," as well as exactly how packets traverse them.
OVN (The Brain): Runs in the Management Plane. It translates your intent (Logical Routers, Switches, Security Groups) into raw instructions.
Northbound DB (
ovn-ovsdb-nb-0): Stores intent (Routers, Ports, ACLs).Southbound DB (
ovn-ovsdb-sb-0): Stores reality (Chassis bindings, MAC/ARP Learning).
OVS (The Muscle): Runs on the compute node (
ovs-vswitchd). It executes the actual forwarding of packets based on "Flow Rules."Geneve (The Wire): The UDP tunnel (Port 6081) that encapsulates VM packets for cross-host transport.
How the Packet Flows (Routed / DVR)
Unlike Layer 2 traffic, routed traffic relies on Distributed Virtual Routing (DVR) in OVN. The routing decision and MAC address rewrite happen locally on the source compute node's virtual switch.
The Packet Flow:
Source VM → Source Tap → br-int (Source Node) → OVN Logical Router (Distributed) → Geneve Encapsulation → Physical NIC → Physical Network (UDP 6081) → Physical NIC → Geneve Decapsulation → br-int (Dest Node) → Dest Tap → Dest VM
Prerequisites: Executing OVN Commands
Depending on your deployment model, your access to the OVN databases differs. Please refer to the correct execution method for your environment.
For SaaS Environment
To run ovn-* commands, you must execute them from the onboarded Compute Nodes. Create an environment file ovs-alias.rc to route commands directly to the central databases:
Export the rc file and start using the OVN commands natively on the compute node:
For Self-Hosted Environment
Self-Hosted users have direct access to the Management Cluster and can execute commands from inside the OVN Southbound Pod.
Procedure
1. Variable Discovery & Pre-Flight Checks
Gather the required IDs and verify logical health from the Management Plane before logging into the hypervisors.
Command (Executed from Management Plane):
Analysis: If step 6 returns an empty chassis, the VM port is unbound.
Logs to Check: Check
/var/log/pf9/ovn/ovn-controller.logon the expected hypervisor. Look forclaim failedorunrecognized portindicating the node refuses to bind the VM.
2. Identify the Tap Interfaces (Data Plane)
Log into the specific Compute Nodes hosting the VMs to find their exact interface names.
Command (Executed on Source Compute Node):
Command (Executed on Destination Compute Node):
3. Logical Simulation (Intent Trace)
Goal: Verify if the OVN Brain allows the traffic and successfully routes it to the destination subnet.
Success: Output shows
lr_in_ip_routing(Routing succeeds),lr_in_arp_resolve(MAC rewrite logic), and a finaloutputaction.Failure (Drop / Flooded): If dropped, check Static Routes on the Logical Router, or verify both subnets are actually attached to the router. If flooded to
_MC_unknown, ensure you used the Gateway MAC foreth.dst, not the Destination VM MAC.
4. Verify Router ARP (MAC Binding)
Goal: If the logical trace fails during routing, ensure the router successfully resolved the MAC address of the destination VM.
Success: Shows the MAC address of the destination VM.
Failure: Empty. The destination VM is down, its Guest OS is filtering ARP requests, or DHCP/Metadata agents are failing.
Logs to Check: Check
/var/log/ovn/ovn-controller.logon the destination node andgrep pinctrl. This thread handles the generation and processing of ARP requests.
5. Capture at the Source (The Tap)
Goal: Prove the packet is actually leaving the Source VM and destined for its Default Gateway. Command (Executed on Source Compute Node):
Success: Packets seen. Proceed to Step 6.
Failure: No packets seen. Check the Guest OS routing table (e.g., default gateway is missing or incorrect inside the VM).
6. Physical Datapath Trace (ofproto/trace)
Goal: Ask OVS why it is dropping the routed packet based on actual flow rules. Command (Executed on Source Compute Node):
Success: Datapath actions show
Action: set_tunnel:0x<VNI>, output:<TUNNEL_PORT>(Routed and sent to the Geneve tunnel).Failure (Drop): Action shows
drop. Note thecookie=0x...value in the trace output and proceed to Step 7.Logs to Check: If OVS actions don't match OVN intent, check
/var/log/pf9/ovn/ovn-controller.logfor flow programming errors (ofctrl_puterrors).
7. Cookie Decoding (If Trace Dropped)
Goal: Translate the physical OVS drop back into an OVN logical rule to pinpoint the exact OpenStack misconfiguration.
The output will reveal the exact logical table where the packet was killed. Use this matrix to identify the root cause:
ls_out_acl/ls_in_acl(Security Group Drop): A Neutron Security Group is explicitly denying the traffic. Check Egress rules on the source and Ingress rules on the destination.lr_in_ip_routing(Routing Drop): The Logical Router has no route to the destination IP. Verify that both subnets are properly attached to the router.lr_in_arp_resolve(ARP Drop): The router knows the route, but cannot resolve the MAC address of the destination VM. Check if the destination VM is powered off, or if its Guest OS is dropping ARP requests.ls_in_port_sec_l2/ls_in_port_sec_ip(Anti-Spoofing Drop): The Source VM is trying to transmit traffic using a MAC or IP address that does not legitimately belong to its port.
8. Sniff the Tunnels (Inter-Host)
Goal: Verify routed packets are physically crossing the wire. Command (Executed on Source & Destination Compute Nodes):
Success: Packets leave Source Node and arrive at Destination Node.
Failure: Packets leave Source Node but do not arrive. A physical firewall or switch ACL is blocking UDP 6081.
9. Capture at the Destination (The Tap)
Goal: Prove the OpenStack network successfully routed and delivered the packet to the Destination VM.
Success: Packets arriving. If pings still fail, the Destination VM's Guest OS firewall is dropping traffic from the remote subnet.
Failure: Traces passed and tunnels look good, but packets don't hit the dest tap. Proceed to Step 10.
10. Clear Stale FDB Entries (Ghost Traffic)
Goal: If either the Source or Destination VM was recently migrated, OVN may be tunneling traffic to the wrong compute node based on a stale Forwarding Database (FDB) entry. Because traffic is bidirectional, a stale entry for either MAC will break communication.
Logs to Check: On the compute nodes,
tail -f /var/log/ovn/ovn-controller.log | grep pinctrlto monitor MAC learning updates.
11. The Physical Killers (MTU & Offloads)
If traces pass, tunnels show traffic, and the FDB is correct, but packets still drop, the Geneve encapsulation is failing physically. Command (Executed on Compute Nodes):
Logs to Check: * Run dmesg -T | grep -i eth or check /var/log/syslog for hardware-level drops or NIC driver errors.
Check
/var/log/openvswitch/ovs-vswitchd.logforunreasonably large packetfragmentation warnings.
Most common causes
Missing Router Interface: The destination subnet was never attached to the logical router.
Asymmetric Security Groups: The source VM can egress, but the destination VM's Security Group drops the ingress traffic from the remote subnet.
Unresolved ARP: The Logical Router does not have a
mac_bindingfor the destination VM, dropping the packet during the routing phase.Stale FDB (Ghost Traffic): OVN is sending traffic to the wrong node after a VM migration. This can cause the initial request to drop (Dest migration) or the return reply to drop (Source migration).
Physical MTU Mismatch: Routing is successful, but the encapsulated packet is too large for the physical switch ports connecting the hypervisors.
Guest OS Routing/Firewall: The internal OS firewall is dropping traffic because it originates from an "untrusted" or remote subnet.
Last updated
