Enable OVS with DPDK
OVS with DPDK
OVS WorkerNode Prerequisites
- Bare-metal nodes as worker nodes.
- Hugepages to be enabled on worker nodes.
Hugepage Support
Enable hugepages by performing following steps:
- Update the
/etc/default/grubfile on the worker nodes with following properties:
GRUB_CMDLINE_LINUX='console=tty0 console=ttyS1,115200n8 intel_iommu=on iommu=pt default_hugepagesz=1G hugepagesz=1G hugepages=10'
GRUB_SERIAL_COMMAND='serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1'
- Update Grub
$sudo update-grub- Update vm.nr_hugepages in /etc/sysctl.conf and refresh kernel parameters.
$sudo echo "vm.nr_hugepages = 10" >> /etc/sysctl.conf; $sudo sysctl -p- Reboot the worker nodes
$sudo reboot- Confirm that hugepages are configured.
$grep Huge /proc/meminfoAnonHugePages: 0 kBShmemHugePages: 0 kBFileHugePages: 0 kBHugePages_Total: 10HugePages_Free: 10HugePages_Rsvd: 0HugePages_Surp: 0Hugepagesize: 1048576 kBHugetlb: 10485760 kB $cat /proc/cmdlineBOOT_IMAGE=/boot/vmlinuz-5.4.0-148-generic root=/dev/mapper/vgroot-lvroot ro console=tty0 console=ttyS1,115200n8 intel_iommu=on iommu=pt default_hugepagesz=1G hugepagesz=1G hugepages=10 nvme_core.multipath=0- Mount the hugepages, if not already mounted by default.
$mount -t hugetlbfs -o pagesize=1G none /dev/hugepagesSetting up OVS-DPDK with Pf9 DHCP Server on a PMK cluster
Create a PMK Cluster with the configured worker nodes in the previous section.
PMK cluster should have the following add-ons enabled:
- KubeVirt Add-on
- Advanced Networking Operator (Luigi) Add-on
1. Create Network Plugins Custom Resource
Network Plugin customer resource used to install advanced networking plugins such as ovs, sriov, dpdk, etc. and their configuration.
$cat <<EOF | kubectl apply -f -apiVersionplumber.k8s.pf9.io/v1kindNetworkPluginsmetadata namenetworkplugins-ovs-dpdk namespaceluigi-systemspec plugins hostPlumber #Enabled multus #Enabled ovs#Enabled with configuration. dpdk#Enabled with configuration lcoreMask"0x1" socketMem"1024" pmdCpuMask"0xF0" hugepageMemory"2Gi" dhcpController #EnabledEOFDpdk configuration parameters:
- lcoremask : Specifies the CPU cores on which dpdk lcore threads should be spawned and expects hex string
- socketMem : Comma separated list of memory to pre-allocate from hugepages on specific sockets
- pmdCpuMask : The pmd-cpu-mask is a core bitmask that sets which cores are used by OVS-DPDK for datapath packet processing.
- hugepageMemory : (no. of hugepages*hugepagesize) : The amount of memory for hugepages. For example, in the above yaml it indicates that 2Gi hugepage-backed RAM will be used for ovs-dpdk pod. huge page size is 1Gi it would allocate 2 hugepage.
- Note: hugepageMemory needs to be greater than or equal to socket mem. if socketmem: “1024,1024“ then hugepageMemory >= 2Gi
DHCP controller plugin
DHCP controller plugin enables running PF9 DHCP server inside pod/virtual machine to cater to the DHCP requests from virtual machine instance(not pod in case of Kubevirt). Multus network-attachment-definitions will use DHCP server to assign IPs. Pf9 DHCP server serves as an alternate to the IPAM CNIs (whereabouts, host-local), which are used as delegate from backend CNI, which gets managed/triggered at pod creation and pod deletion.
Refer for more information: https://platform9.com/docs/kubernetes/enable-p9-dhcp
2. Create Host Network Template
Host Network Template is used to define configuration such as ovs-config etc. on the PMK cluster.
$cat <<EOF | kubectl apply -f -apiVersionplumber.k8s.pf9.io/v1kindHostNetworkTemplatemetadata namehost-network-template-ovs-dpdk namespaceluigi-systemspec ovsConfigbridgeName"dpdkbr01" nodeInterface"bond0.2" dpdktrueEOFovsCofig parameters:
- bridgeName : User Defined name of the OVS bridge
- nodeInterface : Physical Network interface to be used to create ovs-bridge with.
- dpdk: Boolean to enable dpdk on hosts.
3. Create Network Attachment Definition
Network Attachment Definition is a Multus CRD used to configure additional NIC on pods and virtual machines.
$cat <<EOF | kubectl apply -f -apiVersion: "k8s.cni.cncf.io/v1"kind: NetworkAttachmentDefinitionmetadata: name: nad-ovs-dpdk-dhcp annotations: k8s.v1.cni.cncf.io/resourceName: ovs-cni.network.kubevirt.io/dpdkbr01spec: config: '{ "cniVersion": "0.3.1", "type": "userspace", "name": "nad-ovs-dpdk-dhcp", "kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig", "logFile": "/var/log/userspace-ovs-net-1-cni.log", "logLevel": "verbose", "host": { "engine": "ovs-dpdk", "iftype": "vhostuser", "netType": "bridge", "vhost": { "mode": "client" }, "bridge": { "bridgeName": "dpdkbr01" #Name of OVS bridge configured in HostNetworkTemplate } }, "container": { "engine": "ovs-dpdk", "iftype": "vhostuser", "netType": "interface", "vhost": { "mode": "server" } } }'EOF4. Create Pf9 DHCP server
$cat <<EOF | kubectl apply -f -apiVersion: dhcp.plumber.k8s.pf9.io/v1alpha1 kind: DHCPServermetadata: name: dhcpserver-pf9-ovs-dpdkspec: networks: - networkName: nad-ovs-dpdk-dhcp interfaceIp: 192.168.15.14/24 leaseDuration: 10m cidr: range: 192.168.15.0/24 range_start: 192.168.15.30 range_end: 192.168.15.100 gateway: 192.168.15.1EOFAbout the fields:
- Name: Name of the DHCPServer. Configurations of dnsmasq will be generated in a Configmap with the same name
- networks: list of all networks that this pod will serve:
- networkName: Name of NetworkAttachmentDefinition to provide IPs for. NAD should not have dhcp plugin enabled.
- interfaceIp: IP address that the pod will be allocated. Must have prefix to ensure proper routes are added.
- leaseDuration: Duration the leases offered should be valid for. Provide in valid formats for dnsmasq (eg: 10m, 5h, etc). Defaults to 1h
- vlanId: Dnsmasq network identifier. Used as an identifier while restoring IPs. Optional.
- cidr: range_start, range_end, gateway are optional. range is compulsory. If range start and end are provided, they will be used in place of the default start and end.
At this point the PMK cluster is ready to be used for workloads such as Pods and Virtual Machines.
Create a sample Virtual Machines to use the nad-ovs-dpdk-dhcp network.
Let’s validate your work by creating a Virtual Machine to consume the nad-ovs-dpdk-dhcp network.
$cat <<EOF | kubectl apply -f -apiVersion: kubevirt.io/v1kind: VirtualMachinemetadata: name: vm-test-ovs-dpdk namespace: defaultspec: running: true template: metadata: labels: debugLogs: "true" kubevirt.io/size: small annotations: kubevirt.io/memfd: "false" spec: terminationGracePeriodSeconds: 30 domain: resources: requests: memory: 2Gi cpu: 1 memory: hugepages: pageSize: "1Gi" devices: disks: - name: containerdisk disk: bus: virtio - name: cloudinitdisk disk: bus: virtio interfaces: - name: default masquerade: {} - name: vhost-user-net-1 vhostuser: {} networks: - name: default pod: {} - name: vhost-user-net-1 multus: networkName: nad-ovs-dpdk-dhcp volumes: - name: containerdisk containerDisk: image: quay.io/kubevirt/fedora-cloud-container-disk-demo - name: cloudinitdisk cloudInitNoCloud: userData: |- #cloud-config password: fedora chpasswd: { expire: False }EOFValidate the VM
$kubectl get vm vm-test-ovs-dpdk -o yamlapiVersion: kubevirt.io/v1kind: VirtualMachinemetadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"kubevirt.io/v1","kind":"VirtualMachine","metadata":{"annotations":{},"name":"vm-test-ovs-dpdk","namespace":"default"},"spec":{"running":true,"template":{"metadata":{"annotations":{"kubevirt.io/memfd":"false"},"labels":{"debugLogs":"true","kubevirt.io/size":"small"}},"spec":{"domain":{"devices":{"disks":[{"disk":{"bus":"virtio"},"name":"containerdisk"},{"disk":{"bus":"virtio"},"name":"cloudinitdisk"}],"interfaces":[{"masquerade":{},"name":"default"},{"name":"vhost-user-net-1","vhostuser":{}}]},"memory":{"hugepages":{"pageSize":"1Gi"}},"resources":{"requests":{"cpu":1,"memory":"2Gi"}}},"networks":[{"name":"default","pod":{}},{"multus":{"networkName":"nad-ovs-dpdk-dhcp"},"name":"vhost-user-net-1"}],"terminationGracePeriodSeconds":30,"volumes":[{"containerDisk":{"image":"quay.io/kubevirt/fedora-cloud-container-disk-demo"},"name":"containerdisk"},{"cloudInitNoCloud":{"userData":"#cloud-config\npassword: fedora\nchpasswd: { expire: False }"},"name":"cloudinitdisk"}]}}}} kubemacpool.io/transaction-timestamp: "2023-05-10T12:50:18.130438694Z" kubevirt.io/latest-observed-api-version: v1 kubevirt.io/storage-observed-api-version: v1alpha3 creationTimestamp: "2023-05-10T12:50:18Z" generation: 1 name: vm-test-ovs-dpdk namespace: default resourceVersion: "116974" uid: 453dd252-2793-4de7-8dc5-9779c7e1828cspec: running: true template: metadata: annotations: kubevirt.io/memfd: "false" creationTimestamp: null labels: debugLogs: "true" kubevirt.io/size: small spec: domain: devices: disks: - disk: bus: virtio name: containerdisk - disk: bus: virtio name: cloudinitdisk interfaces: - macAddress: "02:55:43:00:00:48" masquerade: {} name: default - macAddress: "02:55:43:00:00:49" name: vhost-user-net-1 vhostuser: {} machine: type: q35 memory: hugepages: pageSize: 1Gi resources: requests: cpu: "1" memory: 2Gi networks: - name: default pod: {} - multus: networkName: nad-ovs-dpdk-dhcp name: vhost-user-net-1 terminationGracePeriodSeconds: 30 volumes: - containerDisk: image: quay.io/kubevirt/fedora-cloud-container-disk-demo name: containerdisk - cloudInitNoCloud: userData: |- #cloud-config password: fedora chpasswd: { expire: False } name: cloudinitdiskstatus: conditions: - lastProbeTime: null lastTransitionTime: "2023-05-10T12:50:40Z" status: "True" type: Ready - lastProbeTime: null lastTransitionTime: null status: "True" type: LiveMigratable created: true printableStatus: Running ready: true volumeSnapshotStatuses: - enabled: false name: containerdisk reason: Snapshot is not supported for this volumeSource type [containerdisk] - enabled: false name: cloudinitdisk reason: Snapshot is not supported for this volumeSource type [cloudinitdisk]Validate the VMI
Note that the VMI is annotated with the IP from PF9 DHCP server :
dhcp.plumber.k8s.pf9.io/dhcpserver: '{"02:55:43:00:00:49":"192.168.15.72"}'
$kubectl get vmi vm-test-ovs-dpdk -o yaml apiVersion: kubevirt.io/v1kind: VirtualMachineInstancemetadata: annotations: dhcp.plumber.k8s.pf9.io/dhcpserver: '{"02:55:43:00:00:49":"192.168.15.72"}' kubevirt.io/latest-observed-api-version: v1 kubevirt.io/memfd: "false" kubevirt.io/storage-observed-api-version: v1alpha3 creationTimestamp: "2023-05-10T12:50:18Z" finalizers: - kubevirt.io/virtualMachineControllerFinalize - foregroundDeleteVirtualMachine generation: 9 labels: debugLogs: "true" kubevirt.io/nodeName: 131.153.165.65 kubevirt.io/size: small name: vm-test-ovs-dpdk namespace: default ownerReferences: - apiVersion: kubevirt.io/v1 blockOwnerDeletion: true controller: true kind: VirtualMachine name: vm-test-ovs-dpdk uid: 453dd252-2793-4de7-8dc5-9779c7e1828c resourceVersion: "117040" uid: 2e399241-953a-41e4-a2ce-78a5f1598fddspec: domain: cpu: cores: 1 model: host-model sockets: 1 threads: 1 devices: disks: - disk: bus: virtio name: containerdisk - disk: bus: virtio name: cloudinitdisk interfaces: - macAddress: "02:55:43:00:00:48" masquerade: {} name: default - macAddress: "02:55:43:00:00:49" name: vhost-user-net-1 vhostuser: {} features: acpi: enabled: true firmware: uuid: d7e0dfc2-e769-54e6-8b02-1abcd893cedd machine: type: q35 memory: hugepages: pageSize: 1Gi resources: requests: cpu: "1" memory: 2Gi networks: - name: default pod: {} - multus: networkName: nad-ovs-dpdk-dhcp name: vhost-user-net-1 terminationGracePeriodSeconds: 30 volumes: - containerDisk: image: quay.io/kubevirt/fedora-cloud-container-disk-demo imagePullPolicy: Always name: containerdisk - cloudInitNoCloud: userData: |- #cloud-config password: fedora chpasswd: { expire: False } name: cloudinitdiskstatus: activePods: 46e3d75b-c9b9-4af3-9d59-e9e8a9bfa8a1: 131.153.165.65 conditions: - lastProbeTime: null lastTransitionTime: "2023-05-10T12:50:40Z" status: "True" type: Ready - lastProbeTime: null lastTransitionTime: null status: "True" type: LiveMigratable guestOSInfo: {} interfaces: - infoSource: domain ipAddress: 10.20.117.160 ipAddresses: - 10.20.117.160 mac: "02:55:43:00:00:48" name: default queueCount: 1 - infoSource: domain mac: "02:55:43:00:00:49" name: vhost-user-net-1 queueCount: 1 launcherContainerImageVersion: platform9/virt-launcher:v0.58.1 migrationMethod: BlockMigration migrationTransport: Unix nodeName: 131.153.165.65 phase: Running phaseTransitionTimestamps: - phase: Pending phaseTransitionTimestamp: "2023-05-10T12:50:18Z" - phase: Scheduling phaseTransitionTimestamp: "2023-05-10T12:50:18Z" - phase: Scheduled phaseTransitionTimestamp: "2023-05-10T12:50:40Z" - phase: Running phaseTransitionTimestamp: "2023-05-10T12:50:41Z" qosClass: Burstable runtimeUser: 0 virtualMachineRevisionName: revision-start-vm-453dd252-2793-4de7-8dc5-9779c7e1828c-1 volumeStatus: - name: cloudinitdisk size: 1048576 target: vdb - name: containerdisk target: vdaVariations of OVS networks
OVS networks can be configured using the HostNetworkTemplate Custom Resource
OVS network without DPDK
$cat <<EOF | kubectl apply -f -apiVersionplumber.k8s.pf9.io/v1kindHostNetworkTemplatemetadata namehost-network-template-ovs namespaceluigi-systemspec ovsConfigbridgeName"dpdkbr01" nodeInterface"bond0.2" dpdkfalseEOFOVS Bonded network without DPDK
$cat <<EOF | kubectl apply -f -apiVersionplumber.k8s.pf9.io/v1kindHostNetworkTemplatemetadata namehost-network-template-ovs-bonded namespaceluigi-systemspec ovsConfigbridgeName"dpdkbr01" nodeInterface"bond0.2,bond0.5" dpdkfalse #optional paramters params mtuRequest9192 lacp"active" # create ovs bond with lacp enabledEOFOVS Bonded network with DPDK
$cat <<EOF | kubectl apply -f -apiVersionplumber.k8s.pf9.io/v1kindHostNetworkTemplatemetadata namehost-network-template-ovs-dpdk-bonded namespaceluigi-systemspec ovsConfigbridgeName"dpdkbr01" nodeInterface"bond0.2,bond0.5" dpdktrue #optional paramters params mtuRequest9192 bondMode"balance-tcp" lacp"active"EOF