Deep Dive: Virtual Machine Placement in OpenStack
Virtual machine (VM) or cloud instance placement is one of the most crucial components of a private cloud. Good placement of VMs across the cloud compute infrastructure ensures that the VM has access to the resources specified by a cloud user, delivers balanced performance given the available capacity, and ensures adherence to organizational policies such as high availability, security, tiering etc..
New users of OpenStack are often surprised to see uneven or unexpected distribution of VMs across their hypervisors. This article explains the virtual machine placement algorithm used by OpenStack in general and Platform9 managed OpenStack in particular, its interaction with resource overcommitment, and a few practical examples of how to fine tune the placement preferences to suit your deployment. In some cases, Platform9 has implemented features that add capabilities to vanilla OpenStack.
Before diving into details, it is worthwhile to note the following sub-components of OpenStack Nova and their functions
- Nova-API: This service fronts the REST API exposed by the nova project.
- Nova-Conductor: This service acts as a broker and handles essentials like database queries, statistics and service updates.
- Nova-Network: This service is optional. It handles networking aspects of nova when in OpenStack deployments not using the Neutron networking project.
- Nova-Scheduler: This service decides the right hypervisors for placing VMs.
As shown in the diagram above, the nova-conductor service places VMs on hypervisors suggested by nova-scheduler. Out of the box, OpenStack does not use nova-network service during placements. These additional steps are specific to Platform9 managed Openstack, and enable Virtual Machine placement in heterogeneous network environment where not every host may have the network required for running a particular VM. The scheduler service is aware of resource availability on all hypervisors in the private cloud and makes its placement choices accordingly.
OpenStack can use different scheduling strategies for placing VMs. This article explains the filter scheduler which is used by Platform9 private cloud.
When nova-scheduler gets a request for placing a VM, it:
- Runs the VM requirements through a set of filters to create a trimmed list of eligible hypervisors. The filtering is done iteratively and the final list is a result of applying filtering criteria of all enabled filters
- Chooses a random hypervisor from the list
- Returns the chosen hypervisor to the nova-conductor service for placement. Nova-conductor in turn makes a call to nova-compute service to create the virtual machine.
Like other things in OpenStack, the filters used are configurable. In Platform9’s deployment, the following filters are used:
- RetryFilter: Tries a different host if a chosen host fails to create the VM. By default VM creation is tried on up to 3 hosts.
- RamFilter: Based on configured RAM size of VM, chooses hosts that have enough free memory to accommodate the VM.
- CoreFilter: Based on configured Virtual CPUs, chooses hosts that have enough CPU cores to run the VM.
- NetworkFilter (Platform9 Specific): Based on networks specified by the user, chooses hosts which are connected to all of those networks. Vanilla openstack assumes uniform networks across all hypervisors and does not trim hypervisors based on network availability.
- ImagePropertiesFilter: Chooses a host which is suitable for the type of image. For example, if VMDK image is specified by user, the filter returns only VMware hypervisors to place VMs.
- ComputeFilter: Returns live hosts known to nova. If a host is not able to heartbeat timely it is marked ‘disabled’ and is filtered out.
- AggregateInstanceExtraSpecsFilter: A Nova feature called “aggregates” let admins group hypervisors together. This filter lets users place VMs on a specific hypervisor aggregate.
Practical Observations on Openstack Nova Virtual Machine Placement
- Nova places multiple VMs on a few hypervisors, while others are never used: Nova scheduler looks for all hosts which can satisfy resource requirements and placement policies of a VM. The actual host is chosen at random from the available set. So, the random choice tries to make placements even, but it is not guaranteed. It is important to know that the choice of placement is never wrong — only the hypervisors which are capable of running VMs are chosen by nova-scheduler.
- Nova places more VMs on a hypervisor than its memory capacity: Nova over-commits memory of a hypervisor. Not every VM uses all of its configured memory all the time, so the same underlying physical memory can be time shared between VMs, saving costs of running a virtual infrastructure. Same overcommitment applies to the CPU. For more information, check out this Platform9 support article
- Running production VMs on separate machines from devtest: Nova’s “host aggregates” feature lets users group hypervisor machines and tag them. OpenStack admin can tag flavors and aggregate with matching tags — such as “prod” or “test”. Nova scheduler matches tags during placement and chooses hypervisors from the right aggregate to place VMs. Aggregates also enable availability zone (AZ) feature. Hypervisors from different geological locations can be grouped into aggregates and VM placements based on AZ tags ensures high availability. Another example is use of this feature to ensure fault tolerance (FT) where VMs supporting a service get placed on different hypervisors so that if one machine goes down, the other continues to service requests.
- “No Host Found” error when placing VMs: Nova throws this generic error when it fails to find right host for a VM or the VM creation fails on hypervisor. Following are some of the common issues users run into:
- There are no hypervisor with enough resources to host the VM
- Hypervisor errors from failed VM creation. For example – Incorrect file system permissions, image download failures, etc. In case of OpenStack on vSphere: DRS placement errors. When using Platform9, all such errors are clearly shown to the end user.
- When using vSphere with OpenStack, the Distributed Resource Scheduler (DRS) may reject placement request. A few examples of these failures are
- Lack of resources on a single ESX hypervisor to host the VM: OpenStack sees vSphere clusters, not individual hosts. So the aggregate capacity of a cluster may appear much bigger than resources available on a single hypervisor and nova-scheduler may pass the request to a cluster with insufficient resources
- DRS configuration: OpenStack requires DRS to be in automatic mode for placements to be successful. DRS clusters without automatic controls do not work with Openstack. For deeper analysis of interaction between OpenStack Nova and vSphere DRS, check out this article.
To summarize, OpenStack Nova scheduler can choose the right hypervisor for running instances. Compared to vSphere DRS, it appears pretty rudimentary. However, in most cases the chosen hypervisor is a good option. Being simple also allows OpenStack Nova to make quick decisions and scale better. OpenStack admins can use the host aggregates feature to enable more advanced features such as availability zones and fault tolerance.
- Kubernetes Service Mesh: A Comparison of Istio, Linkerd, and Consul - October 21, 2019
- Democratizing MySQL: From Cloud Managed to Kubernetes Managed - June 11, 2019
- Kubernetes Logging and Monitoring: The Elasticsearch, Fluentd, and Kibana (EFK) Stack – Part 2: Elasticsearch Configuration - September 12, 2018