Search Domain Resolution Failure by Public Nameservers in Onprem Setup

Problem

Setting public nameservers like [1.1.1.1] and [8.8.8.8] in the /etc/resolv.conffile of the Management Plane master nodes of the airgapped environment, affecting pods to be in timeout errors:

CoreDNS pod logs
    
​x
    
[ERROR] plugin/errors: 2 keystone. [namespace].svc.cluster.local.abc.com. AAAA: read udp [Pod_IP]:55269→1.1.1.1:53: i/o timeout​[ERROR] plugin/errors: 2 percona-db-pxc-db-haproxy. [namespace].svc.cluster.local.abc.com. AAAA: read udp [Pod_IP]:57739->1.1.1.1:53: i/o timeout​[ERROR] plugin/errors: 2 decco-consul-consul-ui.default.svc.cluster.local.abc.com. A: read udp• [Pod_IP]:60181→1.1.1.1:53: 1/o timeout​[ERROR] plugin/errors: 2 keystone. [namespace].svc.cluster.local.abc.com. A: read udp[Pod_IP]:50706→8.8.8.8:53: 1/o timeout
Copy

Observing Liveness/Readiness probe failures on pods like Glance-Api, Nova-Api-Osapi, Neutron-Server, etc,

Pod describe- Glance-Api pod
    
Warning  Unhealthy  30m (x830 over 26h)    kubelet  Liveness probe failed: Get "http://[Pod_IP]:9292/": dial tcp [Pod_IP]:9292: connect: connection refused  Warning  Unhealthy  10m (x2745 over 26h)   kubelet  Readiness probe failed: Get "http://[Pod_IP]:9292/": dial tcp [Pod_IP]:9292: connect: connection refused
Copy

Environment

Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher
Self-Hosted Private Cloud Director Kubernetes - v2025.4 and Higher
Component: DNS

CoreDNS demonstrates a behavior in which any error encountered by its plugins can prevent DNS name resolution. In airgapped environments, this results in public nameservers being unreachable, even if they remain listed in /etc/resolv.conf (via /etc/netplan/50-cloud-init.yaml file in this case)

YAML
    
 
sudo cat /etc/netplan/50-cloud-init.yaml •... vlans:        production.91:            addresses:            - [Adrdress]            id: 91            gateway4: [Gateway address]            link: [LInk Name]            mtu: 9000            nameservers:               addresses:                - 8.8.8.8                - 1.1.1.1                search:                - abc.com
Copy

Here, when the PCD pods attempt to communicate with other pods via DNS, the presence of an additional search domain (e.g., abc.com) and the absence of an internal nameserver cause DNS queries to be forwarded to upstream nameservers (such as 8.8.8.8 and 1.1.1.1) configured on the nodes. Since internet connectivity is disabled, these queries time out, leading to I/O timeout errors.

This situation generates a high volume of failed DNS requests—approximately 2,800 per 166 seconds -- as pods continuously attempt to resolve names unsuccessfully. The resulting delays cause further instability within the environment, including failures in PCD pod DNS resolution, readiness and liveness probe failures, and overall management plane instability.

Diagnostics

Timeout errors mentioning the failure at the DNS servers [1.1.1.1:53, and 8.8.8.8:53]

CoreDNS Pod logs
    
[ERROR] plugin/errors: 2 keystone. namespace.svc.cluster.local.abc.com. AAAA: read udp [Pod_IP]:55269→1.1.1.1:53: i/o timeout​[ERROR] plugin/errors: 2 percona-db-pxc-db-haproxy. namespace.svc.cluster.local.abc.com. AAAA: read udp [Pod_IP]:57739->1.1.1.1:53: i/o timeout​[ERROR] plugin/errors: 2 decco-consul-consul-ui.default.svc.cluster.local.abc.com. A: read udp[Pod_IP]:60181→1.1.1.1:53: 1/o timeout​[ERROR] plugin/errors: 2 keystone. namespace.svc.cluster.local.abc.com. A: read udp[Pod_IP]:50706→8.8.8.8:53: 1/o timeout
Copy

Check nameserver values in the below configuration files:

Conf files
    
 
$ cat /etc/netplan/50-cloud-init.yaml...  nameserver 8.8.8.8  nameserver 1.1.1.1  search abc.com​$ cat /etc/resolv.conf...  nameserver 8.8.8.8  nameserver 1.1.1.1  search abc.com
Copy

Resolution

Remove any unnecessary search domains from your DNS configuration.
Configure custom or internal nameservers that are accessible within your environment and capable of resolving the required DNS queries.

Validation

All the pods in the Management Plane cluster are in the running state.
No Readiness/Liveness probe failures in the OpenStack component pods like Nova, Neutron, Glance etc.

Last updated on

Was this page helpful?