Calico-kube-controller Pod Restarts Frequently Due To OOM- Memory Exhaustion.

Problem

The calico-kube-controller pod is getting restarted frequently due to OOM- memory exhaustion with 137 error code:

% kubectl -n kube-system describe pod calico-kube-controllers-6f4d4c87cf-pnxbx
Name:                 calico-kube-controllers-6f4d4c87cf-pnxbx
...
Status:               Running
...
Controlled By:  ReplicaSet/calico-kube-controllers-6f4d4c87cf
Containers:
  calico-kube-controllers:
    Image:          calico/kube-controllers:v3.23.5
    State:          Running
      Started:      Wed, 04 Oct 2023 17:14:57 +0530
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 04 Oct 2023 17:02:07 +0530
      Finished:     Wed, 04 Oct 2023 17:14:56 +0530
    Ready:          True
    Restart Count:  244
    Limits:
      cpu:     200m
      memory:  400Mi
    Requests:
      cpu:      1m
      memory:   25Mi
....
Events:
  Type     Reason     Age                      From     Message
  ----     ------     ----                     ----     -------
  Normal   Created    41m (x244 over 83d)      kubelet  Created container calico-kube-controllers
  Normal   Pulled     28m (x245 over 83d)      kubelet  Container image "calico/kube-controllers:v3.23.5" already present on machine
  Warning  Unhealthy  2m23s (x10692 over 83d)  kubelet  Readiness probe failed: command "/usr/bin/check-status -r" timed out

Environment

  • Platform9 Managed Kubenetes - v5.6.8.

  • Kubernetes version 1.23.8.

Answer

This is a known issue, a jira- PMK-6180 has already been filed to track this issue and resolve it. The fix will be available in upcoming patch release.

Workaround

Modify the readiness probe timeout to 10 seconds and increase the memory limit on the pod to 2Gi.

Before modification:

Modify the calico-kube-controllers deployment using below command:

After modification using:

Additional Information

This is known bug with JIRA ID: PMK-6180

Last updated