Set up vGPU
Virtual GPU (vGPU) allows multiple VMs to share physical GPU resources efficiently. This approach maximizes resource utilization and reduces costs while still providing GPU acceleration for lighter workloads.
Configure vGPU infrastructure
Set up your vGPU infrastructure following the specific sequence required for virtual GPU functionality.
Prerequisites
Before beginning vGPU configuration, ensure your hosts has the required GPU drivers and licensing:
vGPU functionality requires proper NVIDIA drivers and valid licenses to be installed on the hosts before any configuration steps.
Ensure that the GPU cards intended for vGPU configuration are unbound and not linked to any other process or device. For more details see Troubleshooting GPU Support
Required components:
- NVIDIA GPU drivers installed and functioning on the host
- NVIDIA vGPU licenses properly configured
- NVIDIA license server created and accessible
- Valid license allocation for your vGPU usage
Ensure that SR-IOV is enabled in the BIOS before proceeding with vGPU configuration.
Verify GPU drivers are installed:
nvidia-smi
Step 1: Onboard vGPU host
Before configuring vGPU, you must first onboard your GPU host using pcdctl
.
- Onboard your vGPU host using
pcdctl
. - Verify the hosts onboarding completed successfully.
- Ensure you have administrator access to the onboarded hosts.
Step 2: Run initial vGPU configuration
Execute the GPU configuration script to set up vGPU functionality on your host.
The GPU configuration script is located at /opt/pf9/gpu/pf9-gpu-configure.sh
on your onboarded host.
- Access your onboarded vGPU host with administrator privileges.
- Navigate to the GPU script directory by using the following command.
cd /opt/pf9/gpu
- Run the GPU configuration script and enter option 2 (
vgpu pre configure
).
sudo ./pf9-gpu-configure.sh
This configuration will prompt you to reboot at the end.
- The script will prompt you to update grub and reboot. If you select N in the prompt, manually run the following commands:
sudo update-grub
sudo reboot
You may require to wait for the host to come back online before proceeding.
- To verify if your vGPU pre configuration is successful, run the following command.
sudo ./pf9-gpu-configure.sh
- Enter option 6 (
validate vgpu
) on the terminal.
Step 3: Configure SR-IOV for vGPU
Configure SR-IOV settings required for vGPU functionality.
Currently, vGPU configuration in PCD is supported only through SR-IOV. Therefore, only GPU models with SR-IOV enabled are supported for vGPU use in PCD.
- Navigate to the GPU script directory using the following command.
cd /opt/pf9/gpu
- Run the GPU configuration script and enter option 3 (
vGPU SR-IOV configure
).
sudo ./pf9-gpu-configure.sh
- The script will display output similar to the following.
Step 3: Enable SR-IOV for NVIDIA GPUs
Detecting PCI devices from /sys/bus/pci/devices...
Found the following NVIDIA PCI devices:
Found the following NVIDIA devices: 0000:c1:00.0
Enter the full PCI device IDs (e.g., 0000:17:00.0 0000:18:00.0) to enable sriov, separated by spaces.
Press Enter without input to configure ALL listed NVIDIA GPUs:
No PCI device IDs provided. Configuring all NVIDIA GPUs...
Enabling SR-IOV for 0000:c1:00.0...
Enabling VFs on 0000:c1:00.0
- You can either:
- Enter specific PCI device IDs separated by spaces.
- Press Enter without input to configure ALL listed NVIDIA GPUs.
If you encounter an Cannot obtain unbindLock
error during this step, refer to the Troubleshooting GPU Support for resolution steps.
- Verify vGPU and SR-IOV configurations, by performing the following steps.
- Navigate to the GPU script directory:
cd /opt/pf9/gpu
- Run the GPU configuration script to verify vGPU setup.
sudo ./pf9-gpu-configure.sh
- Enter option 6 (
Validate vGPU
) on your terminal and review the verification output to confirm vGPU set up is configured.
- On the console, view Infrastructure > GPU Hosts to verify if your GPU host appears on the list. You will see:
- Compatibility mode (vGPU)
- GPU model and device ID
Step 4: Create host configuration and vGPU cluster
Create the necessary host configuration and cluster settings for vGPU operation.
- Navigate to Infrastructure > Cluster Blueprint > Host Configurations on the PCD console.
- Select Add Host Configuration to create a new configuration.
- Configure your new PCD host by entering the network and management settings. Each setting controls a specific functionality that determines how the host operates within your PCD environment.
Field Name | Description |
---|---|
Name this configuration | Specify the host name to configure. |
Network Interface | Enter the Physical Interface Name. |
Physical Network Label | This is an optional entry. Use a descriptive label to identify and organize physical network interfaces. By assigning meaningful names like "Production-Network" or "Management-VLAN" you can filter your search for easier identification and troubleshooting. |
Management | Enable management functions. |
VM Console | Enable VM Console access to allow administrators to connect directly to virtual machines running on this host for troubleshooting and management. |
Image Library I/O | Enable Image Library I/O to allow this host to read from and write to the centralized image repository for VM deployment and updates. |
Virtual Network Tunnels | Enable Virtual Network Tunnels to allow secure network connectivity between this host and other hosts in the PCD environment. |
Host Liveness Checks | Enable Host Liveness Checks to automatically monitor a specific host health status and trigger alerts when the host is unresponsive. |
- Name this configuration.
- Configure the basic host settings with network section configured in the blueprint.
Step 5: Create vGPU cluster
- Navigate to Infrastructure > Clusters on the PCD console.
- Select Add Cluster to configure the cluster configuration.
Optionally, you can also configure VMHA or DRR settings.
- Select Enable GPU and then select the GPU mode: vGPU for sharing GPUs across multiple VMs.
- Select Save.
Your host configuration and cluster now support vGPU workloads.
Step 6: Authorize vGPU hosts
Authorize your vGPU configured host in your cluster.
- Navigate to Infrastructure > Cluster Hosts in the PCD console.
- Authorize the hosts by assigning:
- Host configuration
- Hypervisor role
- vGPU cluster
You may be required to wait for few minutes the authorization process to complete.
Step 7: Configure vGPU host with vGPU profile
Configure your vGPU host with the appropriate vGPU profile.
- Navigate to Infrastructure > GPU Hosts in the PCD console.
- Select your vGPU host from the list.
- Configure the vGPU host with a vGPU profile.
Only single profile selection is allowed per vGPU host.
Step 8: Complete vGPU host configuration
Run the configuration script to complete vGPU host setup.
- Access your GPU host with administrator privileges.
- Navigate to the GPU script directory:
cd /opt/pf9/gpu
- Run the GPU configuration script.
sudo ./pf9-gpu-configure.sh
- Enter option 4 (
vGPU host configure
) on the terminal to complete host configuration. - Verify that the vGPU host is properly configured and appears in the Infrastructure > GPU Hosts. You will see:
- Compatibility mode (vGPU)
- GPU model and device ID
- Available vGPU profiles
Your vGPU infrastructure is now ready for creating flavors and deploying VMs. For more details see Create GPU Enabled Flavors
Monitor vGPU resources
Monitor GPU usage and availability for vGPU configurations:
- Navigate to Infrastructure > GPU Hosts to view:
- GPU models and total VRAM per GPU
- Used and available VRAM per GPU
- Active vGPU profiles and their utilization
- vGPU slices available vs. assigned
vGPU migration behavior
Understanding live migration behavior for vGPU VMs:
- vGPU VMs can be migrated if the destination host supports the same vGPU profile.
- The system validates compatibility before allowing migration.
- Migration fails if the destination host does not have the required vGPU profile available.
vGPU best practices
- Profile selection: Choose vGPU profiles that match your workload requirements.
- Resource monitoring: Monitor vGPU utilization to optimize resource allocation.
- Driver compatibility: Ensure vGPU drivers are compatible with your guest operating system