Netplan Apply Moves Cluster To Pending State

Problem

Every instance of applying Netplan moves the cluster to a pending state, irrespective of any actual changes made in the Netplan or not.

Environment

  • Platform9 Managed Kubernetes - v4.4
  • Keepalived

Cause

In a multi-master cluster, applying Netplan causes the systemd networking stack to restart, following which the Virtual IP (VIP) attached to the active master is impacted.

Typically, in a multi-master scenario, the keepalived service must move the VIP onto a different master in case of any crisis. However, due to this upstream bug seen with keepalived, upon applying Netplan, the VIP does not move to another master. Instead, the master in question is still retained as the active master, however with no attached VIP visible. Eventually, the connection between the active master and the worker nodes is hampered, thereby throwing the cluster into a pending state.

Resolution

Perform the following steps on one master at a time as a workaround.

1. Apply the Netplan after making your configurations

Copy

2. [wait for a couple of minutes]

3. Restart keepalived service

Copy

Each instance of restarting keepalived will transfer the VIP to another master.

Finally, perform the following step on worker nodes.

1. Apply the Netplan

Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard