Under the Covers of the Platform9 OpenStack Architecture
Platform9’s mission is to make private clouds easy for enterprises at any scale. This means creating a solution that provide some important benefits for our customers including:
- A private cloud that is not only simple to deploy but also simple to manage. At Platform9, we decided that the best way we could make that a reality for customers is to remove the burden of deploying and managing their cloud management system, including tasks such as monitoring, scaling, backups, upgrades, etc.. We achieve that by delivering Cloud Management-as-a-Service delivered via a unique SaaS model.
- A private cloud that enables customers to leverage their existing assets aka. brownfield, including infrastructure, skillsets, and processes. This is in contrast to most cloud solutions that require new greenfield environment. Platform9 integrates with and allows customers to bring over their existing KVM and VMware vSphere deployments.
- A private cloud that also enables customers to rapidly and easily consume new technologies, such as Linux containers, while using a standard interface and API set. That is the primary reason Platform9 chose to standardize on OpenStack as our underlying cloud management platform. We believe OpenStack offers the de facto API standard for private clouds and has a flexible plugin architecture that allows new technologies to be easily integrated to the platform.
You will be hearing more in the coming weeks about how we provide the benefits of brownfield integration and technology integration. This post focuses on how our SaaS solution enables Platform9’s engineers to deliver Cloud Management-as-a-Service. Thanks to Platform9’s stellar engineering team that designed and implemented this solution. Special thanks to co-founders Bich Le, Madhura Maskasky, and Roopak Parikh for their inputs on this blog post. So let’s start with a high-level overview of the Platform9 Managed OpenStack architecture.
The Platform9 Managed OpenStack Architecture
Platform9 runs on what can be considered a three tier architecture. The three tiers include:
- Core Services – A set of services and tools that provides us a centralized method for deploying and managing each of our customer’s private cloud deployment units.
- Deployment Unit – An OpenStack-powered cloud management instance(s) and associated management services deployed for each Platform9 customer. All deployment units (DU) are centrally managed by Platform9 Core Services but no DU is shared by any customers.
- On-Premises Tier (Host Agents) – A daemon running on each Linux KVM server in a customer’s on-premises environment that enables that server to function as an OpenStack compute node. The agent facilitates installation of software that are required by a compute node, such as the nova-compute and nova-novncproxy binaries. The agent also enables discovery of all the resources running on the server, including running virtual machines (VM), network configuration, and storage capacity. The OpenStack controllers in that customer’s DU will also utilize the agent to communicate with and to manage resources deployed on that server.
- On-Premises Tier (Gateway) – An Open Virtualization Appliance (OVA) deployed in a customer’s on-premises vSphere environment that enables communication between a customer’s DU and a customer’s vCenter servers. The gateway enables discovery of all the resources running in a customer’s vSphere clusters, including running VMs, network configuration, and storage capacity. The OpenStack controllers in that customer’s DU will also utilize the gateway to communicate with and to manage resources deployed on those clusters.
In the remainder of this blog post, we will focus on the Platform9 Core Services and customer Deployment Unit, which together make up the bulk of the SaaS components we use to deliver Cloud-Management-as-a-Service. Upcoming blog posts will discuss the host agent and the gateway and how they interact with a customer’s on-premises environment.
Core Services
The Platform9 Core Services are a set of distributed services and utilities that we currently deploy on Amazon Web Services (AWS). We’ve chosen to use AWS because it is an easy lift for us to deploy to that infrastructure and to leverage their global presence and features such as Availability Zones. Architecturally however, Core Services can be modified to deploy on any infrastructure.
The components that make up Core Services include the following:
- Alerting Service – Software used by Platform9 to integrate our Stats Server and Log Analyzer with Platform9’s alerting and paging systems.
- Log Analyzer – Software used by Platform9 to aggregate and to parse logs collected from customer Deployment Units. The Log Analyzer is integrated with our Alerting Service to provide active monitoring.
- Stats Server – Used to monitor the health of customer deployments and ensuring that all services are up and running. The Stats Server achieve this by communicating with the Stats/Health Agent for each customer DU to receive deployment status.
- Snape – A suite of deployment tools used to manage services for each customer’s DU, including running per-customer Deployment Bundles. The suite includes some Platform9 written utilities as well as the Ansible automation tool.
- Configuration Repo – A shared repository of files used for creating Deployment Bundles that are used to instantiate a customer Deployment Unit. Files include Ansible scripts and host agent and gateway rpms which are stamped with unique customer identifiers prior to being packaged as part of a customer’s Deployment Bundle.
As mentioned earlier, The Core Services are deployed on AWS. This allows us to leverage the scale and services of a public cloud while ensuring that customer workloads run securely on-premises in their private cloud environment as part of the On-Premises Tier. At Platform9, we use AWS services to help provide high-availability (HA) to our Core Services tier.
- All Core Services components run in one or more AWS instances in a given Availability Zone (AZ) with a replica of those instances created in a different AZ. Which AZs are used for deployment depend on the geographic location of our customers’ data centers.
- The Core Services store persistent data in an AWS Relational Database Service (RDS) instance which is replicated to the same AZ as the Core Services replica. Note that the Core Services are architected in such a way that we could choose to move to a key-value store solution for persistence if deemed necessary.
- In the event of downtime, either of the production Core Services instances or the AZ they are hosted on, we leverage AWS Elastic Load Balancer (ELB) instances to route traffic to the Core Service replica/AZ after a failover. We also use ELB to re-route traffic after failback.
Deployment Unit
After signing up for a Platform9 account, a Deployment Unit (DU) is created for each customer. As mentioned earlier, no two customer will ever share a DU. Like the Platform9 Core Services, customer DUs are currently deploy on Amazon Web Services (AWS). We’ve chosen to use AWS because it is an easy lift for us to deploy to that infrastructure and to leverage their global presence and features such as Availability Zones. Architecturally however, DUs can be modified to run on any infrastructure.
The components that make up a Deployment Unit include the following:
- OpenStack Controllers – These are the standard OpenStack services, based on stable release, that provide cloud management functions for our customers’ private cloud instances. These services are distributed across multiple AWS instances and can be easily scaled out to meet customer resource demands.
- Resources Manager – This services tracks and catalogs the state of all compute, network, and storage resources running in a customer’s datacenter and being managed by Platform9 OpenStack Controllers. The state is dynamic since customers can place both new and existing resources under Platform9 management at any time. The Resource Manager works with other DU services to ensure the the OpenStack controllers always have a consistent and updated view of managed resources.
- Certificate Repo – This is the per-customer service which provides self-signed certificates that will be used by various services deployed in the DU as well as locally within customer datacenter for intra-service authentication.
- Log Collector – Software used by Platform9 to collect logs from a given customer DU and sent to the Log Analyzer in the Core Services tier for processing.
- Stats/Health Agent – This agent periodically reports status information to the central Stats Server in the Core Services tier. It is responsible for gathering stats data for the following services/components:
- All the services deployed in the DU deployed for a given customer
- All deployed Host Agents
- All deployed Gateways
- Configuration Manager – The Configuration Manager is responsible for the following tasks:
- Installation, configuration, and upgrade of Platform9 application software deployed both within the DU and on-premise in customer datacenters. This includes the OpenStack services and the Host Agent and Gateway.
- Discovery of customer resources such as hypervisors and gathering of telemetry data regarding these resources. The Configuration Manager uses a number of tools including Platform9 created utilities as well as Ansible for configuration management and orchestration.
As with the Core Services, Deployment units run on AWS. This allows us to leverage the scale and services of a public cloud while ensuring that customer workloads run securely on-premises in their private cloud environment as part of the On-Premises Tier. At Platform9, we use AWS services to help provide high-availability (HA) to our customer DUs.
- All DU components run in multiple AWS instances in a given Availability Zone (AZ) with a replica of those instances created in a different AZ in the form of instance snapshots. Which AZs are used for deployment depend on the geographic location of our customers’ data centers.
- The DU stores persistent data in an AWS Relational Database Service (RDS) instance which is replicated to the same AZ as the DU replica.
- In the event of downtime, either of the production DU instances or the AZ they are hosted on, the replica RDS instance will take over and we will auto-deploy and configure the DU instances snapshots. Traffic to and from the On-Premises tier will then be re-routed to the DR DU.
Hopefully, this blog post has provided some insight into how Platform9 has engineered its Cloud Management-as-a-Service to provide a easy to deploy and easy to manage private cloud solution for the masses. In upcoming posts, we’ll dive deeper into the On-Premises tier and into various components of both the Core Services and the Deployment Unit tiers.
- Beyond Kubernetes Operations: Discover Platform9’s Always-On Assurance™ - November 29, 2023
- KubeCon 2023 Through Platform9’s Lens: Key Takeaways and Innovative Demos - November 14, 2023
- Getting to know Nate Conger: A candid conversation - June 12, 2023