Caching Not working for NodeLocal DNSCache.

Problem

  • The NodeLocal DNSCache pods are failing to resolve the cached kubernetes service DNS queries.
  • The DNS resolution for kubernetes services failing with SERVFAIL response.

Environment

  • Platform9 Managed Kubernetes – All Versions

Cause

  • The TTL of the records coming from CoreDNS is 30 Secs by default, hence any .cluster.local records would only be cached for 30s.
Javascript
Copy
  • Due to this any record cached in NodeLocal DNSCache pods would only be queryable for 30 Secs before it is expired from the cache.
  • The DNS resolutions beyond 30 Secs will fail with a SERVFAIL response.

Resolution

  • The CoreDNS ConfigMap may be edited to set a higher TTL for any such domains; however, this can result in to a situation where these records will take longer to update in case their endpoint is updated.
  • The ConfigMap/Corefile for the node-local-dns component would also need to be updated to allow a >30s maximum for any successful lookup record TTLs, e.g.
Javascript
Copy

A similar issue was reported to upstream at

https://github.com/kubernetes/dns/issues/415#issuecomment-712450686

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard