Maintaining VM High Availability Using OpenStack Masakari

OpenStack has become the chosen platform for managing enterprise data centers due to its ability to deliver an infrastructure-as-a-service environment and to run scalable, high availability (HA) applications on top of it for private cloud (on premise) deployments. In computing terms, availability refers to the time a particular service is functionally available in a given period of time. With organizations running increasing numbers of critical systems over private clouds, there is a greater need than ever for reliable, highly available infrastructure. VM high availability allows for maximum uptime when running in the cloud, while shielding the user from any unplanned downtime of the infrastructure.

Challenges with OpenStack High Availability

In broad terms, both Cloud Native apps and traditional applications rely on high availability to maintain maximum uptime. However, while Cloud Native apps are designed to tolerate failures of an availability zone by auto scaling to another server within the zone, traditional apps cannot tolerate such infrastructure failures, as they assume that the underlying infrastructure is fully functioning. OpenStack Masakari provides VM high availability to these apps by automatically recovering VMs from compute host failure events.

By employing a combination of Corosync and Pacemaker, OpenStack Masakari creates a cluster of servers, detecting and reporting failure of hosts in the cluster. In this scenario, the cloud administrator needs to deploy, monitor and manage the cluster using Corosync and Pacemaker. The cloud administrator also needs to deploy and monitor Masakari’s monitoring services, which query Corosync and triggers VM evacuations when host failures occur. After detecting host failures, if the host has to be removed, the administrator needs to make sure the host is healthy again. This reliance has proved to be expensive and resource-intensive for enterprises and their IT staff.

Platform9’s Managed OpenStack Provides VM High Availability “out-of-the box”

Until now, there has not been an out-of-the-box solution for workloads that require both programmatic scale-out (using auto-scaling-groups) and high availability awareness. Platform9 has made it possible for organizations to use OpenStack for deploying mission-critical workloads without sacrificing the powerful, enterprise-grade high availability capabilities they’ve come to expect by shielding the user from any unplanned downtime of the infrastructure.

Platform9 Managed OpenStack leverages OpenStack Masakari for ensuring high availability for KVM but also uses Hashicorp Consul for cluster creation and server failure detection instead of Pacemaker and Corosync. To enable high availability in Platform9, the administrator needs to create a Nova host aggregate with an availability zone and select “enable HA”. Platform9 monitors and ensures that OpenStack Masakari and Consul services are running properly. This takes a lot of complexity out of ensuring OpenStack high availability, making this one of the easiest, cost-effective solutions for enabling VM high availability.

Platform9’s HA capability automatically configures liveness detection among servers, and when a server in a zone fails, Platform9 orchestrates the recovery onto other servers in that zone. In addition to recovering traditional workloads, Platform9’s service mitigates the risk of simultaneous failures spawned by auto-scaling-groups across availability zones.

Watch Pushkar Archrya, Software Engineer, in “Maintaining VM high availability using OpenStack Masakari.”