Exec Probe Timeout Fixed from K8s v1.20 Resulting in Calico Pods to Fail Liveness/Readiness Probes as Default Timeout is 1 second

Problem

Exec probe timeout fixed from K8s v1.20 resulting in calico pods to fail liveness/readiness probes as default timeout is 1 second.

Environment

  • Platform9 Managed Kubernetes - K8s v1.20 and above
  • Calico CNI v3.18

Answer

  • Both the calico-kube-controller pod and all the calico-node pods, the probe failures are taking place as starting from K8s v1.20, the exec probe timeouts were fixed to actually respect the timeout values. The default 1 second timeout value can prove to be too low for loaded clusters.

From Calico v3.20 they increased probe timeouts from 1s to 10s. Part of the reason is that in k8s v1.20, exec probe timeouts were fixed to actually respect the timeout values.

Describe Example
Copy
Kubelet Logs
Copy

Workaround Option 1

  • On ALL master nodes part of the cluster, add parameter timeoutSeconds: 10in files /opt/pf9/pf9-kube/conf/networkapps/calico-v1.20.11.yaml & /opt/pf9/pf9-kube/conf/networkapps/calico-v1.20.11-configured.yaml respectively at 2 sections i.e in Probes for calico-kube-controllers Deployment & calico-node DaemonSet Spec.
  • Perform complete PMK stack restart on each Master Node ONE NODE AT A TIME.
Stack Restart Steps
Copy
  • Post stack restart, the calico-kube-controller pod and all the calico-node pods will be recreated with spec consisting of timeoutSeconds=10.
Post Restart: New Applied Spec [Note the addition of timeoutSeconds: 10 in respective sections]
Copy

The above procedure will not persist post cluster upgrade.

Workaround Option 2

Set ExecProbeTimeout: false as a feature gate in the Dynamic Kubelet Config of the nodes. This reverts to the previous behaviour where the timeouts are ignored in K8s v1.19 and below.

If changes are made to the configmap, all nodes running the pf9-kubelet service that use the configuration will detect the changes in the dynamic kubelet configuration and then integrate those changes into the ConfigMap settings and then restart the pf9-kubelet service. Once restarted, the pf9-kubelet service will use the new configuration in the ConfigMap.

Reference:

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard