Uneven DNS Traffic Across CoreDNS Pods

Problem

Some CoreDNS pods experience significantly higher DNS traffic compared to their counterparts, leading to increased memory usage and ultimately causing the pods to be terminated due to out-of-memory (OOM) issues.

Environment

  • Platform9 Managed Kubernetes - v-5.6.8 and Higher.

Cause

  • Kubernetes ClusterIP services use connection tracking (iptables or IPVS).
  • The first DNS connection from a client (pod, NodeLocalDNS, etc.) is distributed round-robin to CoreDNS pods.
  • All future queries from that client follow the same connection and always go to the same CoreDNS pod (sticky traffic) for as long as that connection is open.
  • This results in persistent, uneven query distribution among CoreDNS pods. This behaviour is expected and not a misconfiguration.

Resolution

  1. CoreDNS Configuration Best Practices

Make sure your CoreDNS ConfigMap looks like this:

CoreDNS Corefile
Copy
  • Make sure loadbalance should be present (helps randomize initial pod selection).
  • cache 30: Reduces repeated lookups and improves DNS response time.
  • max_concurrent 1000: Limits the number of outstanding queries to upstream servers, preventing overload within the CoreDNS pod.
  • errors: Logs DNS errors for visibility and debugging.
  • health: Allows the cluster to automatically detect and replace unhealthy CoreDNS pods.
  • reload: Allows CoreDNS to automatically reload configuration changes without restart.

These settings help ensure everything is in order from the CoreDNS configuration side.

Note: Even with correct settings, sticky traffic is expected because of how Kubernetes tracks connections. This means some CoreDNS pods will always get more queries from certain clients.

2. Quick Fixes

  • Rollout restart CoreDNS deployment to temporarily redistributes load or client connections.
  • Monitor CoreDNS resource usage and watch for recurring OOM or restart events.
  • Increase memory/CPU resources for CoreDNS pods if needed, especially in large or high-traffic clusters.
  • Review and optimize application DNS usage if some workloads cause excessive DNS queries.

Additional Information

This behavior is a result of how Kubernetes load balances traffic using connection tracking for ClusterIP services, leading to persistent connections and uneven query distribution across CoreDNS pods. Even with best practice configuration, this is a common pattern and not a misconfiguration. Over time, persistent uneven load can exhaust CoreDNS resources and may eventually lead to OOM (Out Of Memory) events as well.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard