Calico to Cilium Migration on PMK Workload Clusters

Migrate your workload clusters from Calico to Cilium CNI with Platform9 Managed Kubernetes v5.13.0+. Follow our detailed guide for prerequisites, execution steps, and final verifications to ensure a s

Problem

  • Platform9 Managed Kubernetes v5.13.0 and newer releases support Cilium CNI as the CNI to be used in workload clusters.

  • How to migrate the already existing workload clusters from Calico CNI to Cilium CNI?

Environment

  • Platform9 Managed Kubernetes - v5.13.0 and Higher

  • Kubernetes Version - v1.31.x and Higher

  • Component: CNI

Procedure

circle-exclamation

Prerequisites

  • Download the migration script from HEREarrow-up-right.

  • This script is recommended to be executed from any master node.

  • Passwordless SSH access to all the nodes with root user or any user sudo level permissions.

  • The cluster should be accessible by kubectl commands.

    • For this, a kubeconfig is already present on master nodes at the location - /etc/pf9/kube.d/kubeconfigs/admin.yaml

  • Ensure jq binary is present on the node from where the script is executed.

  • Identification of new Containers CIDR - The existing Containers CIDR cannot be reused for Cilium, hence a new Containers CIDR needs to be specified.

  • Keep these Release Notesarrow-up-right handy for version information.

Define Variables in Migration Script

  • Set the below environment in the script. No other variables should be affected.

  • Refer the below working sample for Calico to Cilium Migration for workload clusters on v1.31 and Higher on PMK v5.13.0 and Higher.

CILIUM_CLI_VERSION_OVERRIDE , CILIUM_TARGET_CLUSTER_VERSION , CALICO_NODE_IMAGE variables should be set as referring to the Release Notesarrow-up-right. Ensure version numbers are correct.

CILIUM_IPV4_CLUSTER_POOL_CIDR should be set with a new Containers CIDR. Should not be an existing CIDR. The existing Containers CIDR can be found in the Cluster Details in UI. In this example, the CIDR is chosen in, 10.29.0.0/16 but this is customizable.

CILIUM_INTERFACE should be set with the same interface name which calico is using.

SSH_USER should be set as the username of the user, which will be used to connect to other nodes via SSH.

SLEEP_AFTER_DRAIN is a parameter in seconds (default 30). If there are a significantly high number of pods on each node, then set the parameter to a higher value. This parameter waits for the script for the duration mentioned after each node is drained so that there is time for all the pods to reschedule.

Execution

  • Set executable permission on the script and execute the script.

  • The script will confirm if the variables in the script are defined correctly, and should it proceed with finalizing those:

  • The script will then check for all the prerequisites and then proceed further

  • Post that some cluster resources will be backed up and a file named cilium.yaml is generated.

  • Set as no and the script will be stopped; Identify the file created named cilium.yaml in the same directory from where the script is executed.

  • In the same cilium.yaml file, identify the parameter KUBERNETES_SERVICE_HOST .The value of this should be set as VIP for multi-master clusters and MasterNodeIP for single-master clusters. This value is expected to be auto-populated but it is necessary to be validated. This parameter is present five times in the script and should be set correctly for every occurrence.

  • Post this execute the script again. The script is idempotent so no new changes will be made in this second run. In this second run, the script will automatically detect the cilium.yaml and proceed with applying the changes.

  • Cilium resources and pods will be created in the cluster in the kube-system namespace.

  • Each node sequentially will be drained and cordoned. After draining, there will be a wait for the number of seconds defined in the SLEEP_AFTER_DRAIN variable pods to be rescheduled.

  • Then, Cilium changes are applied to the node and the script makes some post checks and ultimately uncordons the node and sets it back to Ready state.

  • After this operation is completed on all the nodes, the scripts then prompts to remove the calico installation.

  • Set as yes and this removes Calico and all it's related resources.

  • It also does a cleanup on all the nodes which includes iptables cleanup and other related tasks.

  • Post this, the script prompts to finalize migration. Set as yes and migration is completed.

  • Final Verifications includes checking if all the workload pods are Running and they are running within the IP range mentioned for the new Containers CIDR.

Post Execution

  • Ensure that the migration from Calico to Cilium is Complete and verified.

circle-info

Info

The Cluster Details section in the UI will still show CNI as Calico and Containers CIDR as the older Containers CIDR.

This is just a case of new property not updating in backend and it will not cause any functionality issues in the cluster but it is recommended to be changed in the UI as well to maintain the information integrity.

For the same, run the below Qbert Callarrow-up-right with correct values and refresh the UI to see the changes.

Additional Information

  • Please reach out to the Platform9 Support Team for any questions regarding this.

Last updated