Enable OVS with DPDK
OVS with DPDK
OVS WorkerNode Prerequisites
- Bare-metal nodes as worker nodes.
- Hugepages to be enabled on worker nodes.
Hugepage Support
Enable hugepages by performing following steps:
- Update the
/etc/default/grub
file on the worker nodes with following properties:
GRUB_CMDLINE_LINUX='console=tty0 console=ttyS1,115200n8 intel_iommu=on iommu=pt default_hugepagesz=1G hugepagesz=1G hugepages=10'
GRUB
_SERIAL
_COMMAND='serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1'
- Update Grub
$sudo update-grub
- Update vm.nr_hugepages in /etc/sysctl.conf and refresh kernel parameters.
$sudo echo "vm.nr_hugepages = 10" >> /etc/sysctl.conf;
$sudo sysctl -p
- Reboot the worker nodes
$sudo reboot
- Confirm that hugepages are configured.
$grep Huge /proc/meminfo
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 10
HugePages_Free: 10
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 10485760 kB
$cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.4.0-148-generic root=/dev/mapper/vgroot-lvroot ro console=tty0 console=ttyS1,115200n8 intel_iommu=on iommu=pt default_hugepagesz=1G hugepagesz=1G hugepages=10 nvme_core.multipath=0
- Mount the hugepages, if not already mounted by default.
$mount -t hugetlbfs -o pagesize=1G none /dev/hugepages
Setting up OVS-DPDK with Pf9 DHCP Server on a PMK cluster
Create a PMK Cluster with the configured worker nodes in the previous section.
PMK cluster should have the following add-ons enabled:
- KubeVirt Add-on
- Advanced Networking Operator (Luigi) Add-on
1. Create Network Plugins Custom Resource
Network Plugin customer resource used to install advanced networking plugins such as ovs, sriov, dpdk, etc. and their configuration.
$cat <<EOF | kubectl apply -f -
apiVersion plumber.k8s.pf9.io/v1
kind NetworkPlugins
metadata
name networkplugins-ovs-dpdk
namespace luigi-system
spec
plugins
hostPlumber #Enabled
multus #Enabled
ovs#Enabled with configuration.
dpdk#Enabled with configuration
lcoreMask"0x1"
socketMem"1024"
pmdCpuMask"0xF0"
hugepageMemory"2Gi"
dhcpController #Enabled
EOF
Dpdk configuration parameters:
- lcoremask : Specifies the CPU cores on which dpdk lcore threads should be spawned and expects hex string
- socketMem : Comma separated list of memory to pre-allocate from hugepages on specific sockets
- pmdCpuMask : The pmd-cpu-mask is a core bitmask that sets which cores are used by OVS-DPDK for datapath packet processing.
- hugepageMemory : (no. of hugepages*hugepagesize) : The amount of memory for hugepages. For example, in the above yaml it indicates that 2Gi hugepage-backed RAM will be used for ovs-dpdk pod. huge page size is 1Gi it would allocate 2 hugepage.
- Note: hugepageMemory needs to be greater than or equal to socket mem. if socketmem: “1024,1024“ then hugepageMemory >= 2Gi
DHCP controller plugin
DHCP controller plugin enables running PF9 DHCP server inside pod/virtual machine to cater to the DHCP requests from virtual machine instance(not pod in case of Kubevirt). Multus network-attachment-definitions will use DHCP server to assign IPs. Pf9 DHCP server serves as an alternate to the IPAM CNIs (whereabouts, host-local), which are used as delegate from backend CNI, which gets managed/triggered at pod creation and pod deletion.
Refer for more information: https://platform9.com/docs/kubernetes/enable-p9-dhcp
2. Create Host Network Template
Host Network Template is used to define configuration such as ovs-config etc. on the PMK cluster.
$cat <<EOF | kubectl apply -f -
apiVersion plumber.k8s.pf9.io/v1
kind HostNetworkTemplate
metadata
name host-network-template-ovs-dpdk
namespace luigi-system
spec
ovsConfig
bridgeName"dpdkbr01"
nodeInterface"bond0.2"
dpdktrue
EOF
ovsCofig parameters:
- bridgeName : User Defined name of the OVS bridge
- nodeInterface : Physical Network interface to be used to create ovs-bridge with.
- dpdk: Boolean to enable dpdk on hosts.
3. Create Network Attachment Definition
Network Attachment Definition is a Multus CRD used to configure additional NIC on pods and virtual machines.
$cat <<EOF | kubectl apply -f -
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: nad-ovs-dpdk-dhcp
annotations:
k8s.v1.cni.cncf.io/resourceName: ovs-cni.network.kubevirt.io/dpdkbr01
spec:
config: '{
"cniVersion": "0.3.1",
"type": "userspace",
"name": "nad-ovs-dpdk-dhcp",
"kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig",
"logFile": "/var/log/userspace-ovs-net-1-cni.log",
"logLevel": "verbose",
"host": {
"engine": "ovs-dpdk",
"iftype": "vhostuser",
"netType": "bridge",
"vhost": {
"mode": "client"
},
"bridge": {
"bridgeName": "dpdkbr01" #Name of OVS bridge configured in HostNetworkTemplate
}
},
"container": {
"engine": "ovs-dpdk",
"iftype": "vhostuser",
"netType": "interface",
"vhost": {
"mode": "server"
}
}
}'
EOF
4. Create Pf9 DHCP server
$cat <<EOF | kubectl apply -f -
apiVersion: dhcp.plumber.k8s.pf9.io/v1alpha1
kind: DHCPServer
metadata:
name: dhcpserver-pf9-ovs-dpdk
spec:
networks:
- networkName: nad-ovs-dpdk-dhcp
interfaceIp: 192.168.15.14/24
leaseDuration: 10m
cidr:
range: 192.168.15.0/24
range_start: 192.168.15.30
range_end: 192.168.15.100
gateway: 192.168.15.1
EOF
About the fields:
- Name: Name of the DHCPServer. Configurations of dnsmasq will be generated in a Configmap with the same name
- networks: list of all networks that this pod will serve:
- networkName: Name of NetworkAttachmentDefinition to provide IPs for. NAD should not have dhcp plugin enabled.
- interfaceIp: IP address that the pod will be allocated. Must have prefix to ensure proper routes are added.
- leaseDuration: Duration the leases offered should be valid for. Provide in valid formats for dnsmasq (eg: 10m, 5h, etc). Defaults to 1h
- vlanId: Dnsmasq network identifier. Used as an identifier while restoring IPs. Optional.
- cidr: range_start, range_end, gateway are optional. range is compulsory. If range start and end are provided, they will be used in place of the default start and end.
At this point the PMK cluster is ready to be used for workloads such as Pods and Virtual Machines.
Create a sample Virtual Machines to use the nad-ovs-dpdk-dhcp
network.
Let’s validate your work by creating a Virtual Machine to consume the nad-ovs-dpdk-dhcp network.
$cat <<EOF | kubectl apply -f -
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: vm-test-ovs-dpdk
namespace: default
spec:
running: true
template:
metadata:
labels:
debugLogs: "true"
kubevirt.io/size: small
annotations:
kubevirt.io/memfd: "false"
spec:
terminationGracePeriodSeconds: 30
domain:
resources:
requests:
memory: 2Gi
cpu: 1
memory:
hugepages:
pageSize: "1Gi"
devices:
disks:
- name: containerdisk
disk:
bus: virtio
- name: cloudinitdisk
disk:
bus: virtio
interfaces:
- name: default
masquerade: {}
- name: vhost-user-net-1
vhostuser: {}
networks:
- name: default
pod: {}
- name: vhost-user-net-1
multus:
networkName: nad-ovs-dpdk-dhcp
volumes:
- name: containerdisk
containerDisk:
image: quay.io/kubevirt/fedora-cloud-container-disk-demo
- name: cloudinitdisk
cloudInitNoCloud:
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
EOF
Validate the VM
$kubectl get vm vm-test-ovs-dpdk -o yaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"kubevirt.io/v1","kind":"VirtualMachine","metadata":{"annotations":{},"name":"vm-test-ovs-dpdk","namespace":"default"},"spec":{"running":true,"template":{"metadata":{"annotations":{"kubevirt.io/memfd":"false"},"labels":{"debugLogs":"true","kubevirt.io/size":"small"}},"spec":{"domain":{"devices":{"disks":[{"disk":{"bus":"virtio"},"name":"containerdisk"},{"disk":{"bus":"virtio"},"name":"cloudinitdisk"}],"interfaces":[{"masquerade":{},"name":"default"},{"name":"vhost-user-net-1","vhostuser":{}}]},"memory":{"hugepages":{"pageSize":"1Gi"}},"resources":{"requests":{"cpu":1,"memory":"2Gi"}}},"networks":[{"name":"default","pod":{}},{"multus":{"networkName":"nad-ovs-dpdk-dhcp"},"name":"vhost-user-net-1"}],"terminationGracePeriodSeconds":30,"volumes":[{"containerDisk":{"image":"quay.io/kubevirt/fedora-cloud-container-disk-demo"},"name":"containerdisk"},{"cloudInitNoCloud":{"userData":"#cloud-config\npassword: fedora\nchpasswd: { expire: False }"},"name":"cloudinitdisk"}]}}}}
kubemacpool.io/transaction-timestamp: "2023-05-10T12:50:18.130438694Z"
kubevirt.io/latest-observed-api-version: v1
kubevirt.io/storage-observed-api-version: v1alpha3
creationTimestamp: "2023-05-10T12:50:18Z"
generation: 1
name: vm-test-ovs-dpdk
namespace: default
resourceVersion: "116974"
uid: 453dd252-2793-4de7-8dc5-9779c7e1828c
spec:
running: true
template:
metadata:
annotations:
kubevirt.io/memfd: "false"
creationTimestamp: null
labels:
debugLogs: "true"
kubevirt.io/size: small
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- macAddress: "02:55:43:00:00:48"
masquerade: {}
name: default
- macAddress: "02:55:43:00:00:49"
name: vhost-user-net-1
vhostuser: {}
machine:
type: q35
memory:
hugepages:
pageSize: 1Gi
resources:
requests:
cpu: "1"
memory: 2Gi
networks:
- name: default
pod: {}
- multus:
networkName: nad-ovs-dpdk-dhcp
name: vhost-user-net-1
terminationGracePeriodSeconds: 30
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-cloud-container-disk-demo
name: containerdisk
- cloudInitNoCloud:
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
name: cloudinitdisk
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-05-10T12:50:40Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: null
status: "True"
type: LiveMigratable
created: true
printableStatus: Running
ready: true
volumeSnapshotStatuses:
- enabled: false
name: containerdisk
reason: Snapshot is not supported for this volumeSource type [containerdisk]
- enabled: false
name: cloudinitdisk
reason: Snapshot is not supported for this volumeSource type [cloudinitdisk]
Validate the VMI
Note that the VMI is annotated with the IP from PF9 DHCP server :
dhcp.plumber.k8s.pf9.io/dhcpserver: '{"02:55:43:00:00:49":"192.168.15.72"}'
$kubectl get vmi vm-test-ovs-dpdk -o yaml
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
annotations:
dhcp.plumber.k8s.pf9.io/dhcpserver: '{"02:55:43:00:00:49":"192.168.15.72"}'
kubevirt.io/latest-observed-api-version: v1
kubevirt.io/memfd: "false"
kubevirt.io/storage-observed-api-version: v1alpha3
creationTimestamp: "2023-05-10T12:50:18Z"
finalizers:
- kubevirt.io/virtualMachineControllerFinalize
- foregroundDeleteVirtualMachine
generation: 9
labels:
debugLogs: "true"
kubevirt.io/nodeName: 131.153.165.65
kubevirt.io/size: small
name: vm-test-ovs-dpdk
namespace: default
ownerReferences:
- apiVersion: kubevirt.io/v1
blockOwnerDeletion: true
controller: true
kind: VirtualMachine
name: vm-test-ovs-dpdk
uid: 453dd252-2793-4de7-8dc5-9779c7e1828c
resourceVersion: "117040"
uid: 2e399241-953a-41e4-a2ce-78a5f1598fdd
spec:
domain:
cpu:
cores: 1
model: host-model
sockets: 1
threads: 1
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- macAddress: "02:55:43:00:00:48"
masquerade: {}
name: default
- macAddress: "02:55:43:00:00:49"
name: vhost-user-net-1
vhostuser: {}
features:
acpi:
enabled: true
firmware:
uuid: d7e0dfc2-e769-54e6-8b02-1abcd893cedd
machine:
type: q35
memory:
hugepages:
pageSize: 1Gi
resources:
requests:
cpu: "1"
memory: 2Gi
networks:
- name: default
pod: {}
- multus:
networkName: nad-ovs-dpdk-dhcp
name: vhost-user-net-1
terminationGracePeriodSeconds: 30
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-cloud-container-disk-demo
imagePullPolicy: Always
name: containerdisk
- cloudInitNoCloud:
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
name: cloudinitdisk
status:
activePods:
46e3d75b-c9b9-4af3-9d59-e9e8a9bfa8a1: 131.153.165.65
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-05-10T12:50:40Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: null
status: "True"
type: LiveMigratable
guestOSInfo: {}
interfaces:
- infoSource: domain
ipAddress: 10.20.117.160
ipAddresses:
- 10.20.117.160
mac: "02:55:43:00:00:48"
name: default
queueCount: 1
- infoSource: domain
mac: "02:55:43:00:00:49"
name: vhost-user-net-1
queueCount: 1
launcherContainerImageVersion: platform9/virt-launcher:v0.58.1
migrationMethod: BlockMigration
migrationTransport: Unix
nodeName: 131.153.165.65
phase: Running
phaseTransitionTimestamps:
- phase: Pending
phaseTransitionTimestamp: "2023-05-10T12:50:18Z"
- phase: Scheduling
phaseTransitionTimestamp: "2023-05-10T12:50:18Z"
- phase: Scheduled
phaseTransitionTimestamp: "2023-05-10T12:50:40Z"
- phase: Running
phaseTransitionTimestamp: "2023-05-10T12:50:41Z"
qosClass: Burstable
runtimeUser: 0
virtualMachineRevisionName: revision-start-vm-453dd252-2793-4de7-8dc5-9779c7e1828c-1
volumeStatus:
- name: cloudinitdisk
size: 1048576
target: vdb
- name: containerdisk
target: vda
Variations of OVS networks
OVS networks can be configured using the HostNetworkTemplate
Custom Resource
OVS network without DPDK
$cat <<EOF | kubectl apply -f -
apiVersion plumber.k8s.pf9.io/v1
kind HostNetworkTemplate
metadata
name host-network-template-ovs
namespace luigi-system
spec
ovsConfig
bridgeName"dpdkbr01"
nodeInterface"bond0.2"
dpdkfalse
EOF
OVS Bonded network without DPDK
$cat <<EOF | kubectl apply -f -
apiVersion plumber.k8s.pf9.io/v1
kind HostNetworkTemplate
metadata
name host-network-template-ovs-bonded
namespace luigi-system
spec
ovsConfig
bridgeName"dpdkbr01"
nodeInterface"bond0.2,bond0.5"
dpdkfalse
#optional paramters
params
mtuRequest9192
lacp"active" # create ovs bond with lacp enabled
EOF
OVS Bonded network with DPDK
$cat <<EOF | kubectl apply -f -
apiVersion plumber.k8s.pf9.io/v1
kind HostNetworkTemplate
metadata
name host-network-template-ovs-dpdk-bonded
namespace luigi-system
spec
ovsConfig
bridgeName"dpdkbr01"
nodeInterface"bond0.2,bond0.5"
dpdktrue
#optional paramters
params
mtuRequest9192
bondMode"balance-tcp"
lacp"active"
EOF