Calico to Cilium Migration on PMK Workload Clusters

Problem

  • Platform9 Managed Kubernetes v5.13.0 and newer releases support Cilium CNI as the CNI to be used in workload clusters.
  • How to migrate the already existing workload clusters from Calico CNI to Cilium CNI?

Environment

  • Platform9 Managed Kubernetes - v5.13.0 and Higher
  • Kubernetes Version - v1.31.x and Higher
  • Component: CNI

Procedure

The migration script for Calico to Cilium Migration ensures the operation is sequential (one node at a time). But it is strongly recommended to perform this migration in a complete downtime window as all the pods are assigned new IPs from new Containers CIDR and inter-pod communication is expected to fail during this time.

Prerequisites

  • Download the migration script from HERE.

  • This script is recommended to be executed from any master node.

  • Passwordless SSH access to all the nodes with root user or any user sudo level permissions.

  • The cluster should be accessible by kubectl commands.

    • For this, a kubeconfig is already present on master nodes at the location - /etc/pf9/kube.d/kubeconfigs/admin.yaml
  • Ensure jq binary is present on the node from where the script is executed.

  • Identification of new Containers CIDR - The existing Containers CIDR cannot be reused for Cilium, hence a new Containers CIDR needs to be specified.

  • Keep these Release Notes handy for version information.

Define Variables in Migration Script

  • Set the below environment in the script. No other variables should be affected.
  • Refer the below working sample for Calico to Cilium Migration for workload clusters on v1.31 and Higher on PMK v5.13.0 and Higher.
Sample
Copy

CILIUM_CLI_VERSION_OVERRIDE , CILIUM_TARGET_CLUSTER_VERSION , CALICO_NODE_IMAGE variables should be set as referring to the Release Notes. Ensure version numbers are correct.

CILIUM_IPV4_CLUSTER_POOL_CIDR should be set with a new Containers CIDR. Should not be an existing CIDR. The existing Containers CIDR can be found in the Cluster Details in UI. In this example, the CIDR is chosen in, 10.29.0.0/16 but this is customizable.

CILIUM_INTERFACE should be set with the same interface name which calico is using.

SSH_USER should be set as the username of the user, which will be used to connect to other nodes via SSH.

SLEEP_AFTER_DRAIN is a parameter in seconds (default 30). If there are a significantly high number of pods on each node, then set the parameter to a higher value. This parameter waits for the script for the duration mentioned after each node is drained so that there is time for all the pods to reschedule.

Execution

  • Set executable permission on the script and execute the script.
  • The script will confirm if the variables in the script are defined correctly, and should it proceed with finalizing those:
Sample Script Output
Copy
  • The script will then check for all the prerequisites and then proceed further
Bash
Copy
  • Post that some cluster resources will be backed up and a file named cilium.yaml is generated.
Bash
Copy
  • Set as no and the script will be stopped; Identify the file created named cilium.yaml in the same directory from where the script is executed.
  • In the same cilium.yaml file, identify the parameter KUBERNETES_SERVICE_HOST .The value of this should be set as VIP for multi-master clusters and MasterNodeIP for single-master clusters. This value is expected to be auto-populated but it is necessary to be validated. This parameter is present five times in the script and should be set correctly for every occurrence.
  • Post this execute the script again. The script is idempotent so no new changes will be made in this second run. In this second run, the script will automatically detect the cilium.yaml and proceed with applying the changes.
  • Cilium resources and pods will be created in the cluster in the kube-system namespace.
  • Each node sequentially will be drained and cordoned. After draining, there will be a wait for the number of seconds defined in the SLEEP_AFTER_DRAIN variable pods to be rescheduled.
  • Then, Cilium changes are applied to the node and the script makes some post checks and ultimately uncordons the node and sets it back to Ready state.
  • After this operation is completed on all the nodes, the scripts then prompts to remove the calico installation.
Bash
Copy
  • Set as yes and this removes Calico and all it's related resources.
  • It also does a cleanup on all the nodes which includes iptables cleanup and other related tasks.
  • Post this, the script prompts to finalize migration. Set as yes and migration is completed.
Bash
Copy
  • Final Verifications includes checking if all the workload pods are Running and they are running within the IP range mentioned for the new Containers CIDR.

Post Execution

  • Ensure that the migration from Calico to Cilium is Complete and verified.

The Cluster Details section in the UI will still show CNI as Calico and Containers CIDR as the older Containers CIDR.

This is just a case of new property not updating in backend and it will not cause any functionality issues in the cluster but it is recommended to be changed in the UI as well to maintain the information integrity.

For the same, run the below Qbert Call with correct values and refresh the UI to see the changes.

Bash
Copy

Additional Information

  • Please reach out to the Platform9 Support Team for any questions regarding this.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard