# Search Domain Resolution Failure by Public Nameservers in Onprem Setup

## Problem

Setting public nameservers like \[1.1.1.1] and \[8.8.8.8] in the `/etc/resolv.conf`file of the Management Plane master nodes of the airgapped environment, affecting pods to be in timeout errors:

```ruby
[ERROR] plugin_errors: 2 keystone. [namespace].svc.cluster.local.abc.com. AAAA: read udp [Pod_IP]:55269→1.1.1.1:53: i_o timeout

[ERROR] plugin_errors: 2 percona-db-pxc-db-haproxy. [namespace].svc.cluster.local.abc.com. AAAA: read udp [Pod_IP]:57739->1.1.1.1:53: i_o timeout

[ERROR] plugin_errors: 2 decco-consul-consul-ui.default.svc.cluster.local.abc.com. A: read udp
• [Pod_IP]:60181→1.1.1.1:53: 1_o timeout

[ERROR] plugin_errors: 2 keystone. [namespace].svc.cluster.local.abc.com. A: read udp
[Pod_IP]:50706→8.8.8.8:53: 1_o timeout
```

Observing Liveness/Readiness probe failures on pods like Glance-Api, Nova-Api-Osapi, Neutron-Server, etc,

```ruby
Warning  Unhealthy  30m (x830 over 26h)    kubelet  Liveness probe failed: Get "http:__[Pod_IP]:9292_": dial tcp [Pod_IP]:9292: connect: connection refused
  
Warning  Unhealthy  10m (x2745 over 26h)   kubelet  Readiness probe failed: Get "http:__[Pod_IP]:9292_": dial tcp [Pod_IP]:9292: connect: connection refused
```

## Environment

* Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher
* Self-Hosted Private Cloud Director Kubernetes - v2025.4 and Higher
* Component: DNS

## Cause

CoreDNS demonstrates a behavior in which any error encountered by its plugins can prevent DNS name resolution. In airgapped environments, this results in public nameservers being unreachable, even if they remain listed in `/etc/resolv.conf` (via `/etc/netplan/50-cloud-init.yaml` file in this case)

```yaml
sudo cat _etc_netplan_50-cloud-init.yaml ...
 vlans:
        production.91:
            addresses:
            - [Adrdress]
            id: 91
            gateway4: [Gateway address]
            link: [LInk Name]
            mtu: 9000
            nameservers:
               addresses:
                - 8.8.8.8
                - 1.1.1.1
                search:
                - abc.com
```

Here, when the PCD pods attempt to communicate with other pods via DNS, the presence of an additional search domain (e.g., `abc.com`) and the absence of an internal nameserver cause DNS queries to be forwarded to upstream nameservers (such as 8.8.8.8 and 1.1.1.1) configured on the nodes. Since internet connectivity is disabled, these queries time out, leading to I/O timeout errors.

This situation generates a high volume of failed DNS requests—approximately 2,800 per 166 seconds -- as pods continuously attempt to resolve names unsuccessfully. The resulting delays cause further instability within the environment, including failures in PCD pod DNS resolution, readiness and liveness probe failures, and overall management plane instability.

## Diagnostics

* Timeout errors mentioning the failure at the DNS servers \[1.1.1.1:53, and 8.8.8.8:53]

```ruby
[ERROR] plugin_errors: 2 keystone. namespace.svc.cluster.local.abc.com. AAAA: read udp [Pod_IP]:55269→1.1.1.1:53: i_o timeout

[ERROR] plugin_errors: 2 percona-db-pxc-db-haproxy. namespace.svc.cluster.local.abc.com. AAAA: read udp [Pod_IP]:57739->1.1.1.1:53: i_o timeout

[ERROR] plugin_errors: 2 decco-consul-consul-ui.default.svc.cluster.local.abc.com. A: read udp
[Pod_IP]:60181→1.1.1.1:53: 1_o timeout

[ERROR] plugin_errors: 2 keystone. namespace.svc.cluster.local.abc.com. A: read udp
[Pod_IP]:50706→8.8.8.8:53: 1_o timeout
```

Check nameserver values in the below configuration files:

```ruby
$ cat _etc_netplan_50-cloud-init.yaml
...
  nameserver 8.8.8.8
  nameserver 1.1.1.1
  search abc.com

$ cat _etc_resolv.conf
...
  nameserver 8.8.8.8
  nameserver 1.1.1.1
  search abc.com
```

## Resolution

* Remove any unnecessary search domains from your DNS configuration.
* Configure custom or internal nameservers that are accessible within your environment and capable of resolving the required DNS queries.

## Validation

* All the pods in the Management Plane cluster are in the running state.
* No Readiness/Liveness probe failures in the OpenStack component pods like Nova, Neutron, Glance etc.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://platform9.com/kb/pcd/self-hosted/search-domain-resolution-failure-by-public-nameservers-in-onprem-setup.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
