Run EKS workloads on 50% compute resources with Elastic Machine Pool

Run EKS workloads on 50% fewer nodes with Elastic Machine Pool

In this blog post, we will summarize our recent webinar showcasing the value of Platfom9’s latest product, Elastic Machine Pool (EMP), as well as a demonstration of how it works. Rene Soto, Platform9’s Sales Engineer, and Madhura Maskasky, Platform9’s Co-Founder and VP of Product, participated in the webinar.

Read a summary of the webinar:

Example of 60% cost savings through bare metal consolidation

One of EMP’s core value propositions is to reduce compute costs for your EKS clusters. The webinar provided an example of a standard 15-node EKS cluster using m5.4xlarge EC2 instances at $0.77 per hour ($101,000 annual cost) versus the same workload consolidated onto a single m5.metal bare metal instance at $4.97 per hour ($41,000 annual cost) via EMP.

In this example, EMP delivers 60% cost savings while providing the same compute capacity by packing the workloads (EVMs) onto bare metal using overprovisioning.

cost savings through bare metal consolidation

Over-provisioning to maximize utilization

The core innovation of EMP is using overprovisioning principles to pack more virtualized container workloads onto AWS bare metal servers than available capacity. EMP creates highly efficient “Elastic Virtual Machines” (EVMs) that look and feel just like standard EC2 VMs externally, but internally run as overprovisioned VMs on bare metal servers.

By setting overcommit ratios for CPU and memory, EMP intelligently deploys more EVMs than a server’s physical resources to boost utilization. For example, with a 1.5x memory overcommit, EMP can deploy 15GB of EVM memory on a host with only 10GB physical RAM.

This allows EMP to run many more workloads per bare metal server compared to standard EC2 VMs, unlocking massive cost savings.

Dynamic load balancing and scaling

But EMP goes far beyond just static overprovisioning. It continuously monitors workload demands and rebalances EVMs across the available bare metal capacity to maintain high utilization without impacting performance. It does this through three key capabilities:

EMP Bare Metal autoscaling

When an individual bare metal server hits 80% utilization, EMP automatically scales out and provisions additional bare metal capacity from AWS. It then rebalances EVMs across the new capacity.

EMP Bare Metal autoscaling

EVM rebalancing

As workloads increase or new bare metal is added, the EMP rebalancer uses live migration to transparently move running EVMs between bare metal servers. This redistribution maximizes utilization without disrupting the EKS workloads inside the migrated EVMs.

EVM rebalancing

Integrated EKS cluster autoscaling

Finally, EMP integrates with standard Kubernetes cluster autoscaling on EKS. If it detects pending unschedulable pods, EMP’s autoscaler automatically provisions new EVMs to run those pods.

Integrated EKS cluster autoscaling

Zero pod disruption

One of the key innovations that enables EMP to rebalance workloads across bare metal servers without impacting performance is its seamless live migration capability. During the webinar, Madhura Maskasky emphasized this as a core EMP benefit:

We’re able to live migrate these Elastic VMs across bare metal instances with just a few milliseconds of freeze time. The applications running inside the migrated EVMs are totally unaware and unaffected by the migration.

This allows EMP to transparently move running workloads to different bare metal servers as needed to optimize utilization and load distribution. Other right sizing or autoscaler solutions often involve more disruptive restarting or rescheduling of applications.

However, EMP uses advanced live migration techniques that involve:

  • Copying the entire memory, storage, and network state of the EVM from the source to destination bare metal server in the background.
  • Briefly pausing the EVM for just a few milliseconds to finalize the migration cutover to the new server.
  • Immediately resuming the migrated EVM on the new server in exactly the same running state as before.

This means stateful applications like monolithic Java apps, databases, queues, user sessions, or long-running AI processes inside the migrated EVM continue operating without disruption or needing to be rescheduled. From the application’s perspective, the migration is totally seamless.

This unique capability prevents any downtime or performance impact to Kubernetes workloads as EMP rebalances their underying EVMs across the elastic bare metal infrastructure. It ensures consistently high performance even as utilization levels fluctuate over time.

Audience Q&A

Q: If I use Karpenter today, would I get further savings with EMP?

A: EMP already subsumes the bin-packing efficiency that tools like Kapenter provide by eliminating underutilized nodes. But EMP goes further by eliminating unallocated and over provisioned capacity that would otherwise be wasted. This contributes to additional cost savings.

Q: Does EMP support all EC2 instance families?

A: Yes, EMP supports bare metal instances across all major AWS instance families including C, M, R, X, etc. Wherever AWS offers a “metal” instance, EMP can leverage it.

Q: What are the minimum requirements to make EMP worthwhile for small EKS clusters?

A: There is a minimum EKS cluster size needed to justify EMP’s bare metal deployment model. As a general guideline, the aggregate resources (CPU/memory) required for your EKS workloads should be at least 1-2x the size of AWS’s smallest bare metal instance (e.g. 192GB RAM for c5n.metal). Smaller than that and you may not see savings over EC2. However, you can also combine multiple smaller EKS clusters onto the same underlying bare metal pool.

Q: Can I use EMP for just part of my workload?

A: Yes, you can deploy EMP for only a portion of your EKS workloads initially. You just use Kubernetes taints to specify which workloads should land on the EMP-managed nodes.

Ready to take action?

See how Elastic Machine Pool can revolutionize your EKS infrastructure and unlock transformative cost savings and efficiency gains. Request a demo with our founders and solution experts. 

Don’t miss out on maximizing your investment in AWS and EKS! Request your personalized EMP demo today.

Kamesh Pemmaraju

You may also enjoy

The argument for AWS Spot Instances

By Chris Jones

Mastering the operational model challenge for distributed AI/ML infrastructure

By Kamesh Pemmaraju

The browser you are using is outdated. For the best experience please download or update your browser to one of the following:

State of Kubernetes FinOps Survey – Win 13 prizes including a MacBook Air.Start Now