Flannel Crashing on Worker Nodes After Upgrading AWS Clusters to v1.22

Problem

Workloads on a flannel-based AWS cluster will start failing after upgrading a cluster to v1.22. The flannel container fails to create network configuration and the following logs are observed in the container logs

Javascript
Copy

Environment

  • Platform9 Managed Kubernetes - v5.6 and Higher
  • Kubernetes - v1.22
  • Flannel CNI
  • AWS

Cause

The flannel containers on worker nodes fail to communicate with the etcd cluster due to a change in client communication port from 4001 to 2379 since the latter is secure.

Resolution

We recommend performing the following steps for each cluster before upgrading the clusters to v1.22. The following steps also work to resolve the issue if the cluster has already been upgraded to v1.22.

  1. Navigate to the "Clusters" tab under the "Infrastructure" section to select the cluster and click on the "Edit" button.
  1. Without making any changes in the "Edit" section, proceed to scroll down to the bottom and click on "Update Cluster" button.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard