Search Domain Resolution Failure by Public Nameservers in Onprem Setup
Problem
Setting public nameservers like [1.1.1.1] and [8.8.8.8] in the /etc/resolv.conffile of the Management Plane master nodes of the airgapped environment, affecting pods to be in timeout errors:
[ERROR] plugin/errors: 2 keystone. [namespace].svc.cluster.local.abc.com. AAAA: read udp [Pod_IP]:55269→1.1.1.1:53: i/o timeout[ERROR] plugin/errors: 2 percona-db-pxc-db-haproxy. [namespace].svc.cluster.local.abc.com. AAAA: read udp [Pod_IP]:57739->1.1.1.1:53: i/o timeout[ERROR] plugin/errors: 2 decco-consul-consul-ui.default.svc.cluster.local.abc.com. A: read udp• [Pod_IP]:60181→1.1.1.1:53: 1/o timeout[ERROR] plugin/errors: 2 keystone. [namespace].svc.cluster.local.abc.com. A: read udp[Pod_IP]:50706→8.8.8.8:53: 1/o timeoutObserving Liveness/Readiness probe failures on pods like Glance-Api, Nova-Api-Osapi, Neutron-Server, etc,
Warning Unhealthy 30m (x830 over 26h) kubelet Liveness probe failed: Get "http://[Pod_IP]:9292/": dial tcp [Pod_IP]:9292: connect: connection refused Warning Unhealthy 10m (x2745 over 26h) kubelet Readiness probe failed: Get "http://[Pod_IP]:9292/": dial tcp [Pod_IP]:9292: connect: connection refusedEnvironment
- Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher
- Self-Hosted Private Cloud Director Kubernetes - v2025.4 and Higher
- Component: DNS
Cause
CoreDNS demonstrates a behavior in which any error encountered by its plugins can prevent DNS name resolution. In airgapped environments, this results in public nameservers being unreachable, even if they remain listed in /etc/resolv.conf (via /etc/netplan/50-cloud-init.yaml file in this case)
sudo cat /etc/netplan/50-cloud-init.yaml •... vlans production.91 addressesAdrdress id91 gateway4Gateway address linkLInk Name mtu9000 nameservers addresses8.8.8.81.1.1.1 searchabc.comHere, when the PCD pods attempt to communicate with other pods via DNS, the presence of an additional search domain (e.g., abc.com) and the absence of an internal nameserver cause DNS queries to be forwarded to upstream nameservers (such as 8.8.8.8 and 1.1.1.1) configured on the nodes. Since internet connectivity is disabled, these queries time out, leading to I/O timeout errors.
This situation generates a high volume of failed DNS requests—approximately 2,800 per 166 seconds -- as pods continuously attempt to resolve names unsuccessfully. The resulting delays cause further instability within the environment, including failures in PCD pod DNS resolution, readiness and liveness probe failures, and overall management plane instability.
Diagnostics
- Timeout errors mentioning the failure at the DNS servers [1.1.1.1:53, and 8.8.8.8:53]
[ERROR] plugin/errors: 2 keystone. namespace.svc.cluster.local.abc.com. AAAA: read udp [Pod_IP]:55269→1.1.1.1:53: i/o timeout[ERROR] plugin/errors: 2 percona-db-pxc-db-haproxy. namespace.svc.cluster.local.abc.com. AAAA: read udp [Pod_IP]:57739->1.1.1.1:53: i/o timeout[ERROR] plugin/errors: 2 decco-consul-consul-ui.default.svc.cluster.local.abc.com. A: read udp[Pod_IP]:60181→1.1.1.1:53: 1/o timeout[ERROR] plugin/errors: 2 keystone. namespace.svc.cluster.local.abc.com. A: read udp[Pod_IP]:50706→8.8.8.8:53: 1/o timeoutCheck nameserver values in the below configuration files:
$ cat /etc/netplan/50-cloud-init.yaml... nameserver 8.8.8.8 nameserver 1.1.1.1 search abc.com$ cat /etc/resolv.conf... nameserver 8.8.8.8 nameserver 1.1.1.1 search abc.comResolution
- Remove any unnecessary search domains from your DNS configuration.
- Configure custom or internal nameservers that are accessible within your environment and capable of resolving the required DNS queries.
Validation
- All the pods in the Management Plane cluster are in the running state.
- No Readiness/Liveness probe failures in the OpenStack component pods like Nova, Neutron, Glance etc.