Troubleshooting Virtual Machine Boot Failures
Problem
This guide provides step-by-step instructions for troubleshooting Virtual Machines that have successfully deployed (status is ACTIVE in PCD) but fail to boot at the Guest OS or Hypervisor level. This includes VMs stuck at the BIOS/UEFI screen, boot-looping, suffering Kernel Panics, Windows BSODs, or failing to configure themselves via metadata.
Environment
Private Cloud Director Virtualization - v2025.4 and Higher
Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher
Deep Dive: The Boot Flow (Post-Creation)
Once the control plane wires the virtual hardware and tells Libvirt to "Start", the management plane steps back. From this point on, the boot process is entirely dependent on the Guest OS and its interaction with the virtual hardware.
The Guest Boot Pipeline:
QEMU Initialization: The hypervisor process starts and allocates the virtual RAM/CPU.
BIOS / UEFI Handoff: The virtual motherboard searches for a bootable block device (Volume or Image).
Bootloader (GRUB / Windows Boot Manager): Loads the OS kernel into memory.
Kernel / OS Initialization: The Linux or Windows operating system boots, mounting the root filesystem.
Metadata (Post-Boot): The OS (
cloud-initorcloudbase-init) reaches out to the Cloud Metadata service (169.254.169.254) to download the user's SSH keys, password, and hostname.
Procedure
1. Verify the VM State
Confirm the VM actually passed the creation phase and is attempting to run.
If
statusisERRORorBUILD, stop here. This is a Creation issue, not a Boot issue.If
statusisACTIVEandpower_stateisRunning, the provisioning phase succeeded. Proceed to Step 2.
2. Console Log Inspection
Dump the raw TTY output from the Guest OS to see exactly where the boot process stalled.
No bootable device/Boot failed: not a bootable disk: The BIOS cannot find a bootloader. The image is blank, corrupted, or the volume is not flagged as bootable.Kernel panic - not syncing(Linux): The OS kernel crashed. Usually caused by a corrupted image, incompatible hypervisor CPU flags, or missing drivers.dracut-initqueue timeout/dropping to emergency shell(Linux): The kernel loaded, but it cannot find or mount the root filesystem partition (/).cloud-init [...] giving up on network: The OS booted perfectly, but the VM has no network connection and cannot fetch its configuration.
Note: Windows VMs rarely output useful errors to the serial console. If troubleshooting Windows, proceed immediately to Step 3).
3. Live VNC Interrogation (Windows & Linux)
If the console log is unhelpful or you need to intervene manually (e.g., interacting with the GRUB menu or Windows Recovery), generate a VNC console link.
Open the provided URL in a web browser.
Windows BSOD (
INACCESSIBLE_BOOT_DEVICE): The Windows image lacks KVM/VirtIO storage drivers. You must injectviostor/vioscsidrivers into the image before booting.GRUB Menu (Linux): The bootloader is intact but the kernel is failing to load automatically.
Black Screen / Blinking Cursor: The display driver may have crashed, or the OS is hung during early initialization.
4. Verify Image / Volume Boot Flags
If the VM threw a No bootable device error in Step 2, verify the source media is actually configured correctly in PCD.
Volume Bootable Flag: If
bootableisfalse, QEMU will refuse to treat the attached disk as a primary boot device. Fix this with:$ openstack volume set --bootable <VOLUME_ID>.Inherited Metadata (Volume) / Image Properties: Look at the
properties(for images) orvolume_image_metadata(for volumes). Ensure tags likehw_machine_type(e.g.,q35),hw_firmware_type(e.g.,uefivsbios), andos_typematch what the Guest OS expects. If a volume was created from a bad image, the volume itself is fundamentally flawed and usually needs to be rebuilt from a corrected image.
5. Investigate Metadata Failures (SSH/Passwords)
If the VM boots to a login prompt in VNC, but you cannot SSH/RDP into it because the key pair/password was not injected, the initialization agent failed to reach the Cloud Metadata API.
Logs to Check (Inside the VM via VNC):
Linux:
$ sudo tail -n 100 /var/log/cloud-init.logWindows: Check
C:\Program Files\Cloudbase Solutions\Cloudbase-Init\log\cloudbase-init.log
Look for
DataSourceNotFound. The VM tried to route to169.254.169.254but failed.This is almost always a Network issue. Refer the following doc.
6. Hypervisor-Level Boot Crashes (QEMU / Libvirt)
If the VM is ACTIVE but instantly turns off, or if VNC refuses to connect entirely, the QEMU process itself may have crashed immediately after handoff. Run on the Compute Node.
Analysis :
qemu-system-x86_64: ... CPU feature XYZ not found: The requested Flavor told QEMU to emulate a specific CPU feature that your physical server hardware does not actually support. The boot aborts immediately.KVM internal error. Suberror: 1: A severe hardware virtualization fault. Checkdmesg -Ton the hypervisor for underlying kernel/hardware issues.
Most Common Post-Provisioning Boot Causes
VirtIO Driver Missing (Windows): The most common cause of Windows boot failures (
INACCESSIBLE_BOOT_DEVICE). The image was built for VMware/Hyper-V and lacks KVM/VirtIO storage drivers.Disk Format Mismatch: Uploading an image as
rawwhen the file is actuallyqcow2(or vice versa). The hypervisor's storage driver misinterprets the disk headers, leading to immediate bootloader failures. Refer the following doc to check this.Corrupted Image Upload: A
.ovaor compressed.tar.gzfile was uploaded directly to the Image service without being extracted to a raw or.qcow2disk file first. QEMU cannot boot an archive file.Metadata Routing Failure: The VM boots flawlessly, but a missing default gateway or blocked DHCP prevents the VM from downloading its SSH keys from the metadata API.
Bootable Flag Missing: A Cinder volume containing a valid OS was attached, but the platform wasn't told it was a boot device, so the BIOS skips it.
Disk format mismatch could also be another potential cause.
Last updated
