Whereabouts Pods Failing After Cluster Upgrade (1.24.7 to 1.25)

Problem

An issue has been observed where post upgrading from (PMK) 5.7.3 with Kubernetes v1.24.7 to PMK 5.9.4 with Kubernetes v1.25, the whereabouts pods enter CrashLoopBackOff state.

Whereabout pods
Copy

Environment

  • Platform9 Managed Kubernetes 5.7.3 and 5.9.4.
  • Kubernetes versions: v1.24.7-pmk.240 and v1.25.
  • Whereabouts version: v0.4.10

Cause

After the upgrade process, the Whereabouts pods attempt to restart but fails with the following error:

Pod logs
Copy

This occurs because the whereabouts Image field in the NetworkPlugins (CRD) still points to an older image (v0.4.10). Meanwhile, new arguments (SLEEP=false /install-cni.sh && /ip-control-loop -log-level debug) have been introduced in Whereabouts v0.6.3. However, since the image is not updated automatically, the outdated image does not support these arguments, leading to a failure.

Resolution

The recommended fix is to remove the whereaboutsImage field from the NetworkPlugins CRD. This allows Luigi Controller to update Whereabouts to the latest compatible version (v0.6.3), which includes the required changes.

  1. Edit the NetworkPlugins CRD:
Master node
Copy
  1. Locate and remove the following line:
networkplugins CRD
Copy
  1. Save and exit the editor.
  2. Wait for the Whereabouts pods to perform a rolling update. Further validate:
Whereabout pod
Copy
  1. Ensure the pods are running without CrashLoopBackOff status.
  2. Check the updated NetworkPlugins CRD:
networkplugins CRD
Copy

Expected output should no longer contain whereaboutsImage under spec.plugins.whereabouts, allowing Luigi Controller to update it automatically to v0.6.3.

Additional Information

The issue is researched under internal bug PMK-6650.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard