Calico to Cilium Migration on PMK Workload Clusters
Problem
- Platform9 Managed Kubernetes v5.13.0 and newer releases support Cilium CNI as the CNI to be used in workload clusters.
- How to migrate the already existing workload clusters from Calico CNI to Cilium CNI?
Environment
- Platform9 Managed Kubernetes - v5.13.0 and Higher
- Kubernetes Version - v1.31.x and Higher
- Component: CNI
Procedure
The migration script for Calico to Cilium Migration ensures the operation is sequential (one node at a time). But it is strongly recommended to perform this migration in a complete downtime window as all the pods are assigned new IPs from new Containers CIDR and inter-pod communication is expected to fail during this time.
Prerequisites
Download the migration script from HERE.
This script is recommended to be executed from any master node.
Passwordless SSH access to all the nodes with
root
user or any user sudo level permissions.The cluster should be accessible by
kubectl
commands.- For this, a kubeconfig is already present on master nodes at the location -
/etc/pf9/kube.d/kubeconfigs/admin.yaml
- For this, a kubeconfig is already present on master nodes at the location -
Ensure
jq
binary is present on the node from where the script is executed.Identification of new Containers CIDR - The existing Containers CIDR cannot be reused for Cilium, hence a new Containers CIDR needs to be specified.
Keep these Release Notes handy for version information.
Define Variables in Migration Script
- Set the below environment in the script. No other variables should be affected.
- Refer the below working sample for Calico to Cilium Migration for workload clusters on v1.31 and Higher on PMK v5.13.0 and Higher.
CILIUM_CLI_VERSION_OVERRIDE="v0.18.3"
CILIUM_TARGET_CLUSTER_VERSION="v1.17.2"
CILIUM_IPV4_CLUSTER_POOL_CIDR="10.29.0.0/16"
CALICO_NODE_IMAGE="calico/node:v3.27.5"
CILIUM_INTERFACE="ens3"
SSH_USER="root"
SLEEP_AFTER_DRAIN=30
CILIUM_CLI_VERSION_OVERRIDE
, CILIUM_TARGET_CLUSTER_VERSION
, CALICO_NODE_IMAGE
variables should be set as referring to the Release Notes. Ensure version numbers are correct.
CILIUM_IPV4_CLUSTER_POOL_CIDR
should be set with a new Containers CIDR. Should not be an existing CIDR. The existing Containers CIDR can be found in the Cluster Details in UI. In this example, the CIDR is chosen in, 10.29.0.0/16
but this is customizable.
CILIUM_INTERFACE
should be set with the same interface name which calico is using.
SSH_USER
should be set as the username of the user, which will be used to connect to other nodes via SSH.
SLEEP_AFTER_DRAIN
is a parameter in seconds (default 30). If there are a significantly high number of pods on each node, then set the parameter to a higher value. This parameter waits for the script for the duration mentioned after each node is drained so that there is time for all the pods to reschedule.
Execution
- Set executable permission on the script and execute the script.
- The script will confirm if the variables in the script are defined correctly, and should it proceed with finalizing those:
==============================================================
VERIFY THE FOLLOWING CONFIGURATION BEFORE PROCEEDING
==============================================================
CILIUM_CLI_VERSION_OVERRIDE = v0.18.3
CILIUM_TARGET_CLUSTER_VERSION = v1.17.2
CILIUM_IPV4_CLUSTER_POOL_CIDR = 10.29.0.0/16
CALICO_NODE_IMAGE = calico/node:v3.27.5
CILIUM_INTERFACE = ens3
SSH_USER = root
SLEEP_AFTER_DRAIN (seconds) = 10
--- IMPORTANT: If any of the above are incorrect, edit the script before continuing.
Do you want to proceed with the above configuration? (yes/no): yes
- The script will then check for all the prerequisites and then proceed further
Prerequisites checked. Continue with migration script? (yes/no):
- Post that some cluster resources will be backed up and a file named
cilium.yaml
is generated.
I have carefully inspected and, if necessary, edited 'cilium.yaml' and saved it. Continue? (yes/no): no
- Set as
no
and the script will be stopped; Identify the file created namedcilium.yaml
in the same directory from where the script is executed. - In the same
cilium.yaml
file, identify the parameterKUBERNETES_SERVICE_HOST
.The value of this should be set as VIP for multi-master clusters and MasterNodeIP for single-master clusters. This value is expected to be auto-populated but it is necessary to be validated. This parameter is present five times in the script and should be set correctly for every occurrence. - Post this execute the script again. The script is idempotent so no new changes will be made in this second run. In this second run, the script will automatically detect the
cilium.yaml
and proceed with applying the changes. - Cilium resources and pods will be created in the cluster in the
kube-system
namespace. - Each node sequentially will be drained and cordoned. After draining, there will be a wait for the number of seconds defined in the
SLEEP_AFTER_DRAIN
variable pods to be rescheduled. - Then, Cilium changes are applied to the node and the script makes some post checks and ultimately uncordons the node and sets it back to Ready state.
- After this operation is completed on all the nodes, the scripts then prompts to remove the calico installation.
Are you ABSOLUTELY SURE you want to uninstall Calico now? Type 'yes' to proceed. (yes/no): yes
- Set as
yes
and this removes Calico and all it's related resources. - It also does a cleanup on all the nodes which includes iptables cleanup and other related tasks.
- Post this, the script prompts to finalize migration. Set as
yes
and migration is completed.
All node-level cleanup, including iptables, has been attempted. Verify cluster health and finalize migration. (yes/no): yes
======================================================================
--- MIGRATION COMPLETE: Calico has been uninstalled, and Cilium is now the primary CNI.
======================================================================
--- Final verification is still recommended.
- Final Verifications includes checking if all the workload pods are Running and they are running within the IP range mentioned for the new Containers CIDR.
Post Execution
- Ensure that the migration from Calico to Cilium is Complete and verified.
The Cluster Details section in the UI will still show CNI as Calico and Containers CIDR as the older Containers CIDR.
This is just a case of new property not updating in backend and it will not cause any functionality issues in the cluster but it is recommended to be changed in the UI as well to maintain the information integrity.
For the same, run the below Qbert Call with correct values and refresh the UI to see the changes.
curl --request PUT \
--url 'https://<DU_FQDN>/qbert/v4/<TENANT_ID>/clusters/<CLUSTER_ID>/update_cni_on_migration' \
--header 'X-Auth-Token: {X-Auth-Token}' \
--data '{
"networkPlugin": "cilium",
"containersCidr": "<CONTAINERS_CIDR>"
}'
Additional Information
- Please reach out to the Platform9 Support Team for any questions regarding this.