In this blog, you will learn about Platform9 Private Cloud Director’s Virtual Machine High Availability (VM HA), a feature that automatically detects physical host failures within a cluster and recovers the affected VMs by restarting them on healthy hosts in the same cluster. Private Cloud Director refers to this recovery process as VM evacuation. The post details how VM HA works under the hood, the prerequisites for enabling it, how it interoperates with other PCD services, and how it compares to VMware HA.
Introduction
In any production environment, infrastructure failures are a reality. Whether it’s a server hardware issue or an unexpected outage, ensuring that critical virtual machines (VMs) remain available is paramount. Platform9 Private Cloud Director addresses this challenge directly with its built-in Virtual Machine High Availability (VM HA) feature, designed to minimize downtime and maintain business continuity.
What is Platform9 VM HA?
Platform9 VM HA is a core capability that automatically detects physical host failures within a cluster and recovers the affected VMs by moving them to other healthy hosts and powering them on. If you’re coming from VMware, think of it as the equivalent of vSphere HA restarting VMs on surviving hosts. Private Cloud Director calls this process “VM evacuation” because ownership of each VM is transferred from the failed hypervisor to a running one before the VM is started. The net effect is the same: your VMs come back up on healthy infrastructure without manual intervention.
After recovery, complementary features such as Dynamic Resource Rebalancing (DRR) can redistribute load to restore optimal balance across the cluster.
Benefits of Platform9 VM HA
Implementing VM HA in your Platform9 environment provides significant advantages:
- Minimized Downtime: Automatically restarts VMs, significantly reducing the time applications are unavailable due to host failures.
- Service Continuity: Keeps business-critical applications online even during unexpected infrastructure outages.
- Operational Efficiency: Eliminates the need for round-the-clock manual monitoring and intervention, freeing administrators to focus on higher-value tasks.
- Policy-Driven Control: Respects host aggregates, affinity/anti-affinity rules, and VM-specific settings, so you can determine how each workload is handled during failover.
- Seamless Interoperation: Works in concert with DRR and other Private Cloud Director services to maintain both availability and resource efficiency after a failure event.
How Platform9 VM HA Works
The process is designed to be automatic and requires minimal manual intervention during a failure event:
- Continuous Host Monitoring: The Platform9 system constantly monitors the health and responsiveness of all hypervisor hosts participating within an HA-enabled cluster or availability zone.
- Failure Detection: If a host stops responding (due to hardware failure, OS crash, or certain network isolation scenarios), the system detects the failure.
- Automatic VM Recovery: Once a host failure is confirmed, which involves both the management plane and cluster hosts to confirm failure, Platform9 VM HA automatically initiates the process of restarting the VMs that were running on the failed host. These VMs are powered on using available resources on the remaining healthy hosts within the cluster.
Key Concepts and Requirements
Understanding these concepts is helpful when working with Platform9 VM HA:
- Clusters: VM HA operates at the cluster level. A cluster is a group of physical hypervisor hosts managed by Private Cloud Director.
- Shared Storage: This is a critical prerequisite. All VMs should use a block storage volume as the root disk (non-ephemeral root disk). If any VMs use ephemeral storage for the root disk, Ephemeral Shared Storage (configured in the Cluster Blueprint) should be used for all hosts in the virtualized cluster.
- Host Requirements: VM HA requires a minimum number of healthy hosts in a cluster to function correctly. A minimum of two hosts is required for HA activation. If any VMs use a flavor that assigns the VM to a host aggregate, that host aggregate should have at least two hosts in the cluster for failover redundancy. If any VMs use block storage, the block storage role must be assigned to at least two hosts in the cluster. All hosts in a VM HA-enabled cluster must run the same operating system version.
- Configuration: VM HA is enabled per cluster, and once enabled, it applies to all virtual machines in that cluster.
VMware HA vs. Platform9 Private Cloud Director VM HA Comparison
This table compares the High Availability features of VMware vSphere and Platform9 Private Cloud Director.
| Feature | Platform9 Private Cloud Director VM HA | VMware HA |
| Core Function | Automatically restarts VMs on healthy hosts after a host failure within a cluster. | Automatically restarts VMs on other hosts in the cluster after a host failure. |
| Failure Detection | Monitors host health/responsiveness from the management plane and among hosts. | Primary host monitors other hosts via network heartbeats. Datastore heartbeating as secondary mechanism. |
| Recovery Action | Restarts affected VMs on available healthy hosts. | Restarts affected VMs on available healthy hosts. |
| Key Requirement | Shared Storage (accessible by all potential failover hosts) is critical for VM recovery. | Shared Storage (accessible by all hosts in the cluster) is required. |
| Configuration Context | Configured at the Cluster level, and enabled via the Private Cloud Director user interface. | Configured at the vSphere Cluster level via vCenter Server. |
| Host Requirements | Requires a minimum of 2 hosts. | Requires multiple hosts in the cluster. |
| VM Monitoring | Primarily focuses on host health for triggering HA. | Can optionally monitor VM health via VMware Tools heartbeats and restart unresponsive VMs. |
| Resource Management | Relies on available capacity; Dynamic Resource Rebalancing (DRR) manages placement. | Includes Admission Control to reserve cluster resources specifically for HA failover. |
| Network Isolation | System detects unresponsive hosts, which can include certain network isolation scenarios. | Specific configurable responses for host network isolation (e.g., power off VMs, leave VMs powered on). |
| Storage Issues | Requires shared storage to be available for recovery. | Specific configurable responses for datastore accessibility issues (PDL/APD). |
| Underlying Technology | KVM hypervisor, Private Cloud Director Management Plane. | VMware ESXi hypervisor, vCenter Server. |
Conclusion
Platform9 Private Cloud Director’s VM High Availability feature is essential for building a resilient and reliable private cloud. By automatically detecting host failures, correlating health data across distributed agents, and recovering VMs on healthy hosts, it provides the assurance needed to run critical enterprise workloads with confidence. With two-node cluster support, a prerequisite-aware status dashboard, and retry capabilities for failed evacuations, VM HA has matured into a production-grade availability solution. It mirrors the kind of protection familiar to users of traditional virtualization platforms like VMware vSphere, but within Platform9’s flexible, open framework.
Continue learning
Explore our eight learning modules and become a Private Cloud Director expert.