Calico-kube-controller Pod Restarts Frequently Due To OOM- Memory Exhaustion.
Problem
The calico-kube-controller pod is getting restarted frequently due to OOM- memory exhaustion with 137 error code:
% kubectl -n kube-system describe pod calico-kube-controllers-6f4d4c87cf-pnxbxName: calico-kube-controllers-6f4d4c87cf-pnxbxStatus: RunningControlled By: ReplicaSet/calico-kube-controllers-6f4d4c87cfContainers: calico-kube-controllers: Image: calico/kube-controllers:v3.23.5 State: Running Started: Wed, 04 Oct 2023 17:14:57 +0530 Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Wed, 04 Oct 2023 17:02:07 +0530 Finished: Wed, 04 Oct 2023 17:14:56 +0530 Ready: True Restart Count: 244 Limits: cpu: 200m memory: 400Mi Requests: cpu: 1m memory: 25Mi.Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Created 41m (x244 over 83d) kubelet Created container calico-kube-controllers Normal Pulled 28m (x245 over 83d) kubelet Container image "calico/kube-controllers:v3.23.5" already present on machine Warning Unhealthy 2m23s (x10692 over 83d) kubelet Readiness probe failed: command "/usr/bin/check-status -r" timed outEnvironment
- Platform9 Managed Kubenetes - v5.6.8.
- Kubernetes version 1.23.8.
Answer
This is a known issue, a jira- PMK-6180 has already been filed to track this issue and resolve it. The fix will be available in upcoming patch release.
Workaround
Modify the readiness probe timeout to 10 seconds and increase the memory limit on the pod to 2Gi.
Before modification:
% kubectl get deployment calico-kube-controllers -n kube-system -o yamllivenessProbe: exec: command: - /usr/bin/check-status - -l failureThreshold: 6 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 10 name: calico-kube-controllersreadinessProbe: exec: command: - /usr/bin/check-status - -r failureThreshold: 3 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1Modify the calico-kube-controllers deployment using below command:
% kubectl edit deployment calico-kube-controllers -n kube-systemAfter modification using:
x
livenessProbe: exec: command: - /usr/bin/check-status - -l failureThreshold: 6 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 10 name: calico-kube-controllers readinessProbe: exec: command: - /usr/bin/check-status - -r failureThreshold: 3 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 10 resources: limits: cpu: 200m memory: 2000Mi requests: cpu: 1m memory: 25MiAdditional Information
This is known bug with JIRA ID: PMK-6180
Was this page helpful?