Calico-kube-controller Pod Restarts Frequently Due To OOM- Memory Exhaustion.
Problem
The calico-kube-controller pod is getting restarted frequently due to OOM- memory exhaustion with 137 error code:
% kubectl -n kube-system describe pod calico-kube-controllers-6f4d4c87cf-pnxbx
Name: calico-kube-controllers-6f4d4c87cf-pnxbx
Status: Running
Controlled By: ReplicaSet/calico-kube-controllers-6f4d4c87cf
Containers:
calico-kube-controllers:
Image: calico/kube-controllers:v3.23.5
State: Running
Started: Wed, 04 Oct 2023 17:14:57 +0530
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Wed, 04 Oct 2023 17:02:07 +0530
Finished: Wed, 04 Oct 2023 17:14:56 +0530
Ready: True
Restart Count: 244
Limits:
cpu: 200m
memory: 400Mi
Requests:
cpu: 1m
memory: 25Mi
.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Created 41m (x244 over 83d) kubelet Created container calico-kube-controllers
Normal Pulled 28m (x245 over 83d) kubelet Container image "calico/kube-controllers:v3.23.5" already present on machine
Warning Unhealthy 2m23s (x10692 over 83d) kubelet Readiness probe failed: command "/usr/bin/check-status -r" timed out
Environment
- Platform9 Managed Kubenetes - v5.6.8.
- Kubernetes version 1.23.8.
Answer
This is a known issue, a jira- PMK-6180 has already been filed to track this issue and resolve it. The fix will be available in upcoming patch release.
Workaround
Modify the readiness probe timeout to 10 seconds and increase the memory limit on the pod to 2Gi.
Before modification:
% kubectl get deployment calico-kube-controllers -n kube-system -o yaml
livenessProbe:
exec:
command:
- /usr/bin/check-status
- -l
failureThreshold: 6
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
name: calico-kube-controllers
readinessProbe:
exec:
command:
- /usr/bin/check-status
- -r
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
Modify the calico-kube-controllers deployment using below command:
% kubectl edit deployment calico-kube-controllers -n kube-system
After modification using:
x
livenessProbe:
exec:
command:
- /usr/bin/check-status
- -l
failureThreshold: 6
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
name: calico-kube-controllers
readinessProbe:
exec:
command:
- /usr/bin/check-status
- -r
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
resources:
limits:
cpu: 200m
memory: 2000Mi
requests:
cpu: 1m
memory: 25Mi
Additional Information
This is known bug with JIRA ID: PMK-6180
Was this page helpful?