Kubernetes FinOps: Right-sizing Kubernetes workloads

Right-Sizing Kubernetes Workloads

In the earlier blog posts in this series, we detailed some of the mechanisms in Kubernetes available to control the resources allocated to the workloads in the cluster. But even in a relatively small, stable environment, right-sizing Kubernetes workloads can be a challenge as circumstances change, meaning utilization can easily drift (or sprint!) out of the optimal range. Static optimization is no solution for clusters that see any degree of change in either application workloads or workload volume. Fortunately, Kubernetes provides several automated mechanisms to remedy this, and third-party projects and products exist as well – but just like the basic controls, they’re not without caveats.

Background

Since the earliest days, the Kubernetes project has focused on automation as a solution for routine operations challenges. Scaling to accommodate the varying intensity of workloads was no different – from the beginning, Kubernetes supported horizontal scaling of both clusters (by adding or removing nodes) and workloads (by changing the number of replicas of a pod to be maintained by controller automation). The desire to automate these changes themselves resulted in the creation of the earliest autoscaling component, the HorizontalPodAutoscaler, which first appeared in Kubernetes 1.1; the ClusterAutoScaler add-on showed up not much later, in Kubernetes 1.3. These continue to be the main mechanisms that are used to manage workloads, but more recently, the need to automatically alter the actual definitions of workloads themselves brought the VerticalPodAutoscaler, as well as a variety of third-party tools like Karpenter, KEDA, and a host of others.

In this blog post, we’ll cover how the Kubernetes project supports both scaling and automation of scaling changes, and identify some of the inherent issues with the overall in-cluster-controller-based model of scaling automation that currently dominates the field of Kubernetes FinOps.

Kubernetes cluster right-sizing

Horizontally scaling clusters

Horizontally scaling clusters is easy — a new node is provisioned and added to the cluster, or a node is cordoned, drained, removed from the cluster, and terminated. In the past this was automated by using mechanisms like an AWS Auto-Scaling Group, configured with a boot-time configuration that would handle joining the node to the cluster. This pattern remains even with newer tools like CAS and those from third-party projects and vendors. Note, though, that while scale-out is generally non-disruptive, scale-in to increase efficiency usually involves pods being terminated because Kubernetes does not support live migration of pods from one node to another.

Vertically scaling clusters

Vertical scaling is easy in theory too, if your infrastructure supports adding to or removing from a particular resource on a node without restarting it. Kubernetes will just start recognizing the new amount of that resource. However, vertical scale-down of nodes can be challenging if the reason you want to scale down is low resource consumption, because those unconsumed resources may actually be allocated by Kubernetes as part of container requests. Also, many infrastructure providers don’t support any kind of live resizing, so changing node resources will require a shutdown and restart at minimum – possibly a full node replacement.

Kubernetes workload right-sizing

Kubernetes has mechanisms to scale workloads both horizontally and vertically, though not all with the same level of support or ease of use.

Horizontal workload scaling

Horizontal pod scaling appeared early on in Kubernetes development — one of the first resources that existed was called a ReplicationController, which supported scaling the number of replicas of a pod (although because of limitations in the ReplicationController implementation, it was quickly superseded by the Deployment resource). Not long after the Deployment resource appeared, automating horizontal pod scaling became natively supported with a resource built on Deployments, called a HorizontalPodAutoscaler. HPA works by changing the number of replicas a Deployment targets in response to changes in metrics like CPU usage, or optionally in response to custom metrics (more on custom metrics later).

Vertical workload scaling

There are two methods of vertical pod scaling supported by Kubernetes, though only one is stable: restarting and in-place resizing.

Restarting pods is how Kubernetes has handled most updates to a pod spec since the beginning, because some aspects of a pod are defined to be immutable — changes must be implemented by terminating an old pod and creating a new one to replace it. Resource requests are one such aspect — but this means that if an administrator or Kubernetes controller adjusts resource requests up or down, applications incur pod disruption, and some applications don’t tolerate this well.

Starting with Kubernetes 1.27, a feature was added for in-place resource resizing of container resources in a pod — but you currently have to enable an option at cluster install time to use this, because it’s still considered alpha-status in the current Kubernetes version as of this writing (1.29). It also has the limitation that you can’t change the QoS class of a pod.

See the second blog post in this series for details on pod QoS classes.

To automate these methods of resizing pods, the Vertical Pod Autoscaler is an add-on maintained by the Kubernetes project that targets pod-owning resources like Deployments for resizing.

Currently, VPA only supports restart-based resizing; adding support for the new in-place resizing ability is in the proposal stage.

Because the HPA and VPA are separate components with no awareness of each other, they can cause undesirable results if they both target the same resource for autoscaling. A proposal has been made to combine the capabilities of both into a “multidimensional pod autoscaler” that would handle both types in an integrated way, so be aware that how pod autoscaling is defined and implemented could change significantly in the future.

Third-party scaling solutions

A growing number of projects and products provide solutions for workload or cluster scaling, such as KEDA, Karpenter, Goldilocks, and others. In general, these fall into one of two categories:

A controller runs in the cluster and makes changes or recommendations
An agent runs in the cluster gathering data, and reporting to an external SaaS console that makes changes or recommendations

Issues with right-sizing Kubernetes workloads from within Kubernetes

The whole idea of automating scaling is a complex one, but more importantly, there are some inherent concerns with the dominant model of controlling scaling changes using tools run inside the cluster itself, regardless of which specific tools are in use:

If the in-cluster components are directly interacting with authenticated APIs of external services like AWS EC2, they need to safely handle a credential for that API, which means you also need to provide it securely. This is far from impossible to do in a Kubernetes environment, but many “solutions” to that issue merely add a layer of abstraction (for example, having the API credential held by a cluster-external service and provided on-demand moves the problem around, but doesn’t actually solve it because now you have to securely authenticate your workload to that external service). Infrastructure credential exfiltration or inadvertent leakage can cost you many times what your cluster itself did.
Having a cluster management component or agent deployed in the cluster it’s intended to manage makes the component or agent itself a Kubernetes workload – which means it can now be affected by things like an overutilized cluster, node failures, networking issues, etc. These issues can be mitigated by careful planning, but will always be present to some degree.
Poorly-defined autoscaling policies or interactions between autoscalers can lead to cost inflation rather than reduction – for example, if vertical workload autoscaling causes a pod’s resources to be reduced to a point where a horizontal scaler activates based on “high pod CPU utilization”, additional nodes could be provisioned to handle a phantom “load spike” when the actual application load hasn’t changed at all.
Some operations are inherently disruptive: there is no way to move a pod from one node to another without terminating it – and not all workloads operate well with arbitrary disruption like that; this is in fact why Kubernetes introduced the concept of PodDisruptionBudget. As a result, scalers can get caught on the horns of a dilemma: it’s not possible to increase efficiency in some desired way without terminating a pod, but that application’s PodDisruptionBudget requires that pod not be terminated.
Some of them also suffer (through no fault of their own, really) from unrealistic expectations – a tool that is only a cluster autoscaler will never make your workloads more efficient, it can only make sure that your cluster has the right amount of resources to run all your inefficiently-configured workloads. It’s up to a cluster administrator, of course, to make sure the tool they adopt for a particular job is in fact intended to and capable of doing it.)

In general, all these tools build on the same capabilities – and have the same limitations – of the underlying infrastructure and the Kubernetes architecture itself. Essentially, none of them is doing anything a cluster admin couldn’t do with kubectl and infrastructure provisioning tools – they just do it a lot faster due to automation. That doesn’t mean you shouldn’t use them, but it does mean you should carefully consider your needs and which of them are fulfilled or impacted by a given tool.

However, there are limits to how far resource efficiency can be optimized with this approach without giving up reliability. We see this routinely demonstrated in data: despite the plethora of cluster and workload scaling utilities and their increasing adoption, typical cluster utilization in one survey after another remains shockingly low, generally 30% at most – often much less; that unused capacity translates directly into wasted budget. It’s like trying to ship small goods packaged into individual bags or boxes and loading those directly onto ships, planes and trains, repackaging them if necessary first – it can work in theory but past a certain point, it doesn’t scale in the real world and you need a solution that lets you consolidate things and manage them as a whole.

What’s a cluster administrator to do?

In addition to in-cluster management of individual workloads, you need utilization management operating at a layer of abstraction below that of the cluster infrastructure – and at Platform9 we have just the thing to meet that need in Elastic Machine Pool. In our next blog post, we’ll dive into the details of how EMP uses production-proven technology for lower-level resource management to achieve additional cost improvements in harmony with your existing tools for in-cluster FinOps automation.

Additional reading

Previously in this series:

Basics of cluster utilization

Resource management challenges

Also from Platform9:

How a SaaS data management company slashed AWS EKS costs by 58%

Kubernetes documentation:

Horizontal Pod Autoscaling, Vertical Pod Autoscaler and Cluster Autoscaler

Author
Recent Posts

Joe Thompson

Technical Product Marketing Manager at Platform9

Joe has almost 30 years of experience in IT operations and architecture, from helping out at small Internet Service Providers to building large-scale data-analysis facilities. Since 2013, he’s been focused on cloud-native technologies like Kubernetes and OpenStack -- both the advantages of using them, and the operational and administrative challenges they can present.

Java app performance over the decades

By Chris Jones

Kubernetes FinOps: Elastic Machine Pool(EMP) Step-by-Step guide : Part 1

By Joe Thompson

Categorized within: EMP, Kubernetes Tags: Kubernetes, kubernetes cluster, Kubernetes FinOps, kubernetes workloads, workload rightsizing