IP Reconciler CronJob Fails to Run Due to 100 Failed Attempts

Problem

The IP Reconciler cronjob is failing to start with the following events being recorded for the cronjob.

Describe Output
Copy

Environment

  • Platform9 Edge Cloud - 5.3 LTS Patch #7: v-5.3.0-1739149 and below
  • Whereabouts

Answer

The old CronJob controller had a hardcoded, and arbitrary 100 limit where it would stop scheduling the CronJob if it missed 100 total starting windows. Any kind of downtime will contribute to reaching this limit, including upgrades, control plane downtime/master node reboot tests, maintenance, and network outages, etc.

There is a new CronJob V2 controller that was added to Kubernetes upstream, which fixes this among other performance improvements. Link: 1.21-cronjob-ga.

This is only available as a GA feature as the default, starting K8s v1.21.

Starting v5.3 LTS Patch #8: v-5.3.0-1762883, we have enabled CronJob V2 controller feature gate on the kube-controller-manager container for K8s v1.20 clusters by default.

To manually enable this feature on K8s v1.20 clusters on 5.3 LTS Patch #7: v-5.3.0-1739149 and below, please find the below procedure.

The change will need to be performed on all the master nodes.

This is not supported on K8s v1.19.

Command
Copy

Append - "--feature-gates=CronJobControllerV2=true" to the command list for the kube-controller-manager container section. Here is how it should look:

kube-controller-manager Container Spec
Copy
Command
Copy

Running the above command will drain all pods/containers running on the node.

Wait for some time and then verify whether the k8s-master Pod is running. If you see any errors, check the logs from the kube-controller-manager container in the k8s-master pod.

Command
Copy

On a multi-master cluster, ensure that the above steps are made to one master node at a time else ETCD will lose quorum and the cluster will be unreachable.

After another short while, you should see the CronJob has resumed scheduling again, via

Command
Copy
Describe Output
Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard