Uneven DNS Traffic Across CoreDNS Pods

Problem

Some CoreDNS pods experience significantly higher DNS traffic compared to their counterparts, leading to increased memory usage and ultimately causing the pods to be terminated due to out-of-memory (OOM) issues.

Environment

Platform9 Managed Kubernetes - v-5.6.8 and Higher.

Cause

Kubernetes ClusterIP services use connection tracking (iptables or IPVS).
The first DNS connection from a client (pod, NodeLocalDNS, etc.) is distributed round-robin to CoreDNS pods.
All future queries from that client follow the same connection and always go to the same CoreDNS pod (sticky traffic) for as long as that connection is open.
This results in persistent, uneven query distribution among CoreDNS pods. This behaviour is expected and not a misconfiguration.

Resolution

CoreDNS Configuration Best Practices

Make sure your CoreDNS ConfigMap looks like this:

CoreDNS Corefile
    
 
.:53 {    errors    health    kubernetes cluster.local in-addr.arpa ip6.arpa    prometheus :9153    forward . /etc/resolv.conf {        max_concurrent 1000    }    cache 30    loop    reload    loadbalance}
Copy

Make sure loadbalance should be present (helps randomize initial pod selection).
cache 30: Reduces repeated lookups and improves DNS response time.
max_concurrent 1000: Limits the number of outstanding queries to upstream servers, preventing overload within the CoreDNS pod.
errors: Logs DNS errors for visibility and debugging.
health: Allows the cluster to automatically detect and replace unhealthy CoreDNS pods.
reload: Allows CoreDNS to automatically reload configuration changes without restart.

These settings help ensure everything is in order from the CoreDNS configuration side.

Note: Even with correct settings, sticky traffic is expected because of how Kubernetes tracks connections. This means some CoreDNS pods will always get more queries from certain clients.

2. Quick Fixes

Rollout restart CoreDNS deployment to temporarily redistributes load or client connections.
Monitor CoreDNS resource usage and watch for recurring OOM or restart events.
Increase memory/CPU resources for CoreDNS pods if needed, especially in large or high-traffic clusters.
Review and optimize application DNS usage if some workloads cause excessive DNS queries.

Additional Information

This behavior is a result of how Kubernetes load balances traffic using connection tracking for ClusterIP services, leading to persistent connections and uneven query distribution across CoreDNS pods. Even with best practice configuration, this is a common pattern and not a misconfiguration. Over time, persistent uneven load can exhaust CoreDNS resources and may eventually lead to OOM (Out Of Memory) events as well.

Last updated on

Was this page helpful?