Calico Kubeconfig Expires Causing Unauthorized Errors And Pods Creation is Failing

Problem

During a cluster upgrade operation, pods that were drained became stuck in the ContainerCreating state on other worker nodes. Upon describing the pods, the following messages were observed:

Log Snippet
Copy

Environment

  • Platform9 Managed Kubernetes - v5.6
  • Kubernetes version 1.21

Cause

Calico Kubeconfig at /etc/cni/net.d path having an expired token. Token is having validity of 1 year. Here is the upstream bug for this issue.

Resolution

  • An internal Jira PMK-5816 has been created and Platform9 Engineering team delivered the fix in version PMK 5.9.1
  • As a workaround, perform a rollout restart of the calico node daemon set, which will refresh the expired token if the pods are running from more than 1 year.
Javascript
Copy

Additional Information

Upstream fix has been introduced with calico v3.24.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard