Learn why Private Cloud Director is the best VMware alternative

Platform9

Ensuring Uptime: VM High Availability in Platform9 Private Cloud Director

In this blog, you will learn about Platform9 Private Cloud Director’s Virtual Machine High Availability (VM HA), a feature that automatically restarts VMs on healthy hosts following a host failure, ensuring minimal downtime and business continuity. The blog details how VM HA mirrors VMware HA’s capabilities for KVM-based private clouds, highlighting requirements like shared storage and a minimum number of hosts for operation.

Introduction

In any production environment, infrastructure failures are a reality. Whether it’s a server hardware issue or an unexpected outage, ensuring that critical virtual machines (VMs) remain available is paramount. Platform9 Private Cloud Director (PCD) addresses this challenge directly with its built-in Virtual Machine High Availability (VM HA) feature, designed to minimize downtime and maintain business continuity.

What is Platform9 VM HA?

Platform9 VM HA is a core capability that automatically detects physical host failures within a cluster and restarts the affected VMs on other healthy hosts in that same cluster. Much like VMware HA provides a safety net for vSphere environments, Platform9 VM HA delivers similar protection for your KVM-based private cloud managed by PCD.

How Platform9 VM HA Works

The process is designed to be automatic and requires minimal manual intervention during a failure event:

  1. Continuous Host Monitoring: The Platform9 system constantly monitors the health and responsiveness of all hypervisor hosts participating within an HA-enabled cluster or availability zone.
  2. Failure Detection: If a host stops responding (due to hardware failure, OS crash, or certain network isolation scenarios), the system detects the failure.
  3. Automatic VM Recovery: Once a host failure is confirmed, which involves both the management plane and cluster hosts to confirm failure, Platform9 VM HA automatically initiates the process of restarting the VMs that were running on the failed host. These VMs are powered on using available resources on the remaining healthy hosts within the cluster.

Key Concepts and Requirements

Understanding these concepts is helpful when working with Platform9 VM HA:

  • Clusters: VM HA operates at the cluster level. A cluster is a group of physical hypervisor hosts managed by PCD.
  • Availability Zones (AZs): Clusters can be associated with Availability Zones, which define fault domains. HA policies can often be tied to these zones.
  • Shared Storage: This is a critical prerequisite. For VMs to be successfully restarted on a different host, their virtual disks must reside on storage that is accessible to all potential failover hosts within the cluster (e.g., SAN, NAS, or other supported shared storage solutions). VMs using only host-local storage generally cannot be automatically recovered by HA on another host.
  • Host Requirements: VM HA requires a minimum number of healthy hosts in a cluster to function correctly. A minimum of four hosts is required for HA activation, though support for smaller configurations (like two-node clusters) is on the roadmap.
  • Configuration: VM HA is a configurable option when creating or managing a cluster within the Platform9 Private Cloud Director interface (API/CLI or UI). It’s often enabled by default for new clusters.

Benefits of Platform9 VM HA

Implementing VM HA in your Platform9 environment provides significant advantages:

  • Reduced Downtime: Automatically restarts VMs, significantly reducing the time applications are unavailable due to host failures.
  • Improved Business Continuity: Helps ensure critical applications remain operational even during underlying infrastructure problems.
  • Automated Response: Eliminates the need for administrators to manually detect failures and restart VMs, saving time and reducing potential errors during stressful situations.
  • Increased Reliability: Provides a foundational level of resilience for your private cloud workloads.

VMware HA vs. Platform9 PCD VM HA Comparison

This table compares the High Availability features of VMware vSphere and Platform9 Private Cloud Director (PCD).

FeaturePlatform9 PCD VM HAVMware HA 
Core FunctionAutomatically restarts VMs on healthy hosts after a host failure within a cluster/AZ.Automatically restarts VMs on other hosts in the cluster after a host failure.
Failure DetectionMonitors host health/responsiveness from the management plane and among hosts.Primary host monitors other hosts via network heartbeats. Datastore heartbeating as secondary mechanism.
Recovery ActionRestarts affected VMs on available healthy hosts.Restarts affected VMs on available healthy hosts.
Key RequirementShared Storage (accessible by all potential failover hosts) is critical for VM recovery.Shared Storage (accessible by all hosts in the cluster) is required.
Configuration ContextConfigured at the Cluster level, often associated with Availability Zones (AZs). Typically enabled via PCD interface (API/CLI/UI).Configured at the vSphere Cluster level via vCenter Server.
Host RequirementsRequires a minimum number of hosts (e.g., 4 often cited, roadmap for 2).Requires multiple hosts in the cluster.
VM MonitoringPrimarily focuses on host health for triggering HA.Can optionally monitor VM health via VMware Tools heartbeats and restart unresponsive VMs.
Resource ManagementRelies on available capacity; Dynamic Resource Rebalancing (DRR) manages placement.Includes Admission Control to reserve cluster resources specifically for HA failover.
Network IsolationSystem detects unresponsive hosts, which can include certain network isolation scenarios.Specific configurable responses for host network isolation (e.g., power off VMs, leave VMs powered on).
Storage IssuesRequires shared storage to be available for recovery.Specific configurable responses for datastore accessibility issues (PDL/APD).
Underlying TechnologyKVM hypervisor, PCD Management Plane.VMware ESXi hypervisor, vCenter Server.

Conclusion

Platform9 Private Cloud Director’s VM High Availability feature is essential for building a resilient and reliable private cloud. By automatically handling host failures and restarting virtual machines, it provides the assurance needed to run critical enterprise workloads with confidence. It mirrors the kind of protection familiar to users of traditional virtualization platforms like VMware vSphere, but within Platform9’s flexible, open framework.

Continue learning

Explore our eight learning modules and become a Private Cloud Director expert. 

Overview & Introduction 

Storage Basics 

Storage Provisioning

Ensuring Uptime

Kubernetes

Optimizing Workloads

LBaaS Networking Basics

Author

  • Chris Jones

    Chris Jones is the Head of Product Marketing at Platform9. He has previously held positions as an Account Executive and Director of Product Management. With over ten years of hands-on experience in the cloud-native infrastructure industry, Chris brings extensive expertise in observability and application performance management. He possesses deep technical knowledge of Kubernetes, OpenStack, and virtualization environments.

    View all posts
Scroll to Top