Troubleshoot Outside to VM (North-South) Connectivity
Problem
This guide provides instructions for troubleshooting network connectivity failures between a Virtual Machine (VM) and the External Network (Internet or Datacenter). This includes outbound traffic (SNAT) and inbound access via Floating IPs (DNAT). Traffic must traverse a Neutron Logical Router and a designated Gateway Chassis.
Environment
Private Cloud Director Virtualization - v2025.4 and Higher
Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher
Component - Networking Service
Deep Dive: Architecture & Packet Flow
North-South traffic differs from East-West because it is not fully distributed. While internal routing happens on the compute node, the transition to the physical external network is pinned to a specific Gateway Node.
Chassisredirect (The Border): A special OVN port type (
type=chassisredirect) that centralizes the external gateway logic on a specific physical host to manage NAT and external ARP.The Gateway Node: The physical server (e.g.,
punhv0059) where the Logical Router's external leg is physically bound.NAT (SNAT/DNAT): The process of translating the VM’s internal IP (
192.xx.xx.xx) to the Floating IP (10.xx.xx.xx).
How the Packet Flows
Source VM → Source Tap → br-int (Compute Node) → Logical Router (Distributed Leg) → Geneve Tunnel (UDP 6081) → br-int (Gateway Node) → chassis redirect Port → NAT Engine (SNAT Applied) → br-phy1 (External Bridge) → Physical NIC → Datacenter Switch
Prerequisites: Executing OVN Commands
Refer to the existing guide for SaaS Alias setup or Self-Hosted kubectl exec instructions to run ovn-sbctl and ovn-nbctl commands.
Procedure
1. Variable Discovery & Gateway Identification
Gather the required IDs and locate the physical "Exit Door" for the traffic(Gateway node).
Analysis: * If 1.3 is empty: The Network is not attached to a router.
If 1.5 is empty: The Router has no external gateway. NAT is impossible.
If 1.7 is empty: The OVN database hasn't bound the router to a host.
Logs to Check (If 1.7 fails): On the expected Gateway Node, check
/var/log/ovn/ovn-controller.logand grep forcr-lrp-<GW_PORT_ID>. Look forclaim failedorunrecognized porterrors indicating the node is refusing to host the gateway.
2. Identify the Data Plane Interfaces
Log into the Gateway Node to verify the physical bridge mapping.
Success: Output shows your provider net mapped to a bridge (e.g.,
pun-lab:br-phy1), and that bridge contains a physical port (e.g.,bond0).Failure: If mappings are empty, or
br-phy1lacks a physical interface, traffic hits a dead end inside the server and cannot reach the datacenter switch.Logs to Check:
tail -f /var/log/openvswitch/ovs-vswitchd.logon the Gateway Node. Look for errors related to adding physical ports or bridge initialization failures.
3. Logical Simulation (North-South Trace)
Goal: Verify if the OVN "Brain" processes the packet. Because a router has multiple ports, we must pick the Internal one for an outbound trace.
The output reveals the exact logical table where the packet was killed. Use this matrix:
ls_out_acl / ls_in_acl (Security Group Drop): A Neutron Security Group is explicitly denying the traffic.
lr_in_ip_routing (Routing Drop): The Logical Router has no route to the destination. Verify the External Gateway is set.
lr_in_arp_resolve (ARP Drop): The router cannot resolve the MAC of the next-hop (physical switch). Check upstream ARP.
ls_in_port_sec (Anti-Spoofing Drop): The VM is trying to use a MAC/IP that does not belong to its port.
SUCCESS: Trace shows
ct_snat(FLOATING_IP)and ends withoutput to localnet.Failure (Drop): Trace ends with
drop. The logical configuration is blocking the packet (e.g., missing route, Security Group). Proceed to Step 6.
4. Physical Validation (Gateway Node)
Log into the Gateway Node to prove the packet hits the wire.
the Gateway node can be identified using step 1
Success: Packets arrive via tunnel (4.1) and exit
br-phy1with the translated Floating IP (4.2). NAT is working perfectly.Failure (No Arrival): 4.1 is empty. Packets are dropping on the Compute Node (Check SG egress or MTU).
Failure (No Exit): 4.1 shows traffic, but 4.2 is empty. OVS is dropping the packet internally on the Gateway. Proceed to Step 5.
5. Advanced Physical Trace (ofproto/trace)
If Phase 4.2 is empty, ask OVS why it is dropping the packet internally on the Gateway Node.To run a physical ofproto/trace, we need the Geneve tunnel metadata and the correct input port.
Success: Datapath actions show
set_tunneloroutput:<ID>to a patch port leading tobr-phy1.Failure (Drop): Action is
drop. Note thecookie=0x...value in the trace output and proceed to Step 6.Failure (Flooded): Action outputs to local
tapinterfaces or other tunnels. OVN is confused about where the physical exit is (usually a bridge mapping mismatch).Logs to Check: If OVS actions don't match OVN intent, check
/var/log/pf9/ovn/ovn-controller.logon the Gateway Node for flow programming errors (ofctrl_puterrors).
6. Cookie Decoding (Root Cause Analysis)
If Step 5.5 dropped the packet, map the relevant_cookie back to the logical rule.
ls_out_acl / ls_in_acl: A Security Group is explicitly denying the traffic.
lr_in_ip_routing: The Router has no route to the destination. Verify the External Gateway is set.
lr_in_arp_resolve: The Gateway cannot resolve the MAC of the physical switch. Check upstream ARP.
ls_in_port_sec: The VM is spoofing its MAC/IP.
7. Clear Stale FDB Entries (Ghost Traffic)
If the Gateway Node was recently migrated or the Chassis binding changed, traffic may be sent to a "ghost" location.
Success: Traffic immediately resumes once the stale entry is cleared.
Logs to Check: On the Gateway Node,
tail -f /var/log/pf9/ovn/ovn-controller.log | grep pinctrl. This tracks the ARP learning and FDB updates natively from the physical wire.
8. Physical Killers (MTU & Offloads)
If traces pass but traffic fails, the physical NIC may be corrupting large Geneve-encapsulated packets.
Analysis: If MTU is 1500 on the physical NIC, Geneve packets (which add 58 bytes of overhead) will be fragmented or dropped by the physical switch.
Logs to Check: * Run
dmesg -T | grep -i ethor check/var/log/syslogfor hardware-level drops, CRC errors, or driver crashes related to the physical NIC (bond0).Check
/var/log/openvswitch/ovs-vswitchd.logforunreasonably large packetorfragmentationwarnings.
Most Common Causes
Missing External Gateway: The Logical Router was created but never assigned an external gateway port. NAT cannot occur without an "Exit Door."
Asymmetric Routing (Gateway Migration): The Gateway Chassis migrated to a new node, but the physical datacenter switch is still sending return traffic to the old node's MAC address because it missed the Gratuitous ARP (GARP).
SNAT/DNAT Rule Mismatch: The VM has a Floating IP, but the Logical Router’s NAT table is missing the corresponding entry. OVN will route the packet but will not translate the source IP, causing the physical firewall to drop it as "spoofed."
Stale FDB (Ghost Traffic): OVN is still tunneling traffic to the previous Gateway Node after a failover event. This "blackholes" all external traffic until the FDB entry is cleared or times out.
Provider Network Bridge Mapping: The physical bridge (e.g.,
br-phy1) on the Gateway Node is not mapped to the correctphysical_networkname in OVSexternal_ids.Physical MTU Mismatch: North-South traffic often fails for Large Packets (HTTP/Downloads) because the Geneve overhead (58 bytes) makes the packet exceed the 1500 MTU of the physical datacenter switches.
Upstream MAC Filtering: The physical switch port connected to the Gateway Node is configured with "Port Security" or a "MAC Limit" that prevents it from learning the virtual MAC addresses of the Floating IPs.
Last updated
