Kubernetes costs can vary considerably in the enterprise. Depending on whether you decide to host your clusters on the public cloud services – such as Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS), Amazon Elastic Kubernetes Services (EKS) – or on-premise, there are a number of ways to ensure you are spending your money efficiently.
While the public cloud experience is all about speed and unlimited scaling, a lack of proper management can quickly turn into huge bills that are hurting your bottom line. This is mainly because of the significant variation in cost across different providers and the more specific services that they provide. Cloud cost structures often confuse and intimidate people, especially when terms like “autoscaling” are used.
Similar to how you would pick a mobile phone plan to suit your usage, optimizing how you use resources in the cloud can help you achieve considerable savings. The flip side of that coin, however, is that just like with a mobile phone, using a service that’s not part of your package could result in a rather large bill in the mail.
Even when running your Kubernetes infrastructure in your on on-premises data center, you’d want to improve your environment utilization and reduce your internal infrastructure costs.
In this post, we will discuss the major factors that drive up Kubernetes infrastructural costs – like AI, Node type, size, and density – as well as optimizing on-prem resources to meet modern requirements.
The public cloud is built for growth; and while cloud providers make it super easy to grow, it’s also super easy to grow out of your standard plan and into premium territory. This is why, before you can make any adjustments to how you use Kubernetes resources in the cloud or on-premise, the first step is to gain visibility of your current situation.
Prometheus and Grafana are great tools for this purpose and help you create pretty detailed dashboards through which you can visualize your Kubernetes infrastructure costs.
Resources can generally be classified into three groups: compute, memory and storage. Kubernetes only deals with the first two. While Kubernetes allows us to provision nodes with compute and memory, we still need to make sure they’re physically accounted for.
Prometheus is used to first gather metrics using kube-state-metrics, and then analyze those metrics based on namespace, cluster, and pod. They’re then displayed through a Grafana dashboard, which is pretty specific and will show you the exact footprint of each part of your cluster – like how much of each resource Istio is using, for example. (Istio is a service mesh that helps manage traffic between microservices with features like load balancing, service-to-service authentication, and monitoring.)
In production environments in the cloud, Kubernetes nodes exist as instances. Choosing the right size for your master node and worker nodes, in proportion to your cluster, is critical to optimizing cost. You also need to pick the right type of nodes because containers have different requirements. While some may work with general-purpose instances, others might need I/O, memory, or CPU optimized instances. In addition to picking the right size and type, operational hours is another factor that needs to be considered.
Autoscaling is how we tackle that particular problem. While most cloud providers like AWS offer “free” auto-scaling, without proper configuration, you’re right back where you started… except you have a huge bill from your cloud provider. To properly configure auto-scaling, you need to be able to make informed decisions on the ideal node size and scaling parameters for your cluster. If your minimum parameters are too high, you risk a big bill. If they’re too low, you risk downtime. This is why visibility is key.
Another critical factor involved in ensuring cost efficiency in Kubernetes is the number of nodes running in a cluster. While this number is typically controlled by the value NUM_NODES in the config.sh file of whatever platform you’re using, you can’t simply change the value to a very large number. This is because public clouds have limits on the number of resources you can allocate, and you first need to increase your “quota.” They do this because containers compete with each other for resources by default and will leave nothing for the system processes that run Kubernetes.
To avoid running into such difficulties while managing node density, a best practice is to first reserve resources for all system daemons, and then set resource limits for all addon containers. It’s also a good idea to increase your maximum cloud quota for key parameters like CPU instances, VM instances, In-use IP addresses and firewall rules.
There are now a number of tools that not only help you gain visibility into your cluster and into how your applications are consuming resources, but they help you manage your applications effectively, as well. This includes tools like Densify that harness machine learning to enable applications to become “self-aware” of their resource needs so they can adjust the corresponding cloud resources accordingly. Similarly, Yotascale uses AI to scan cloud workloads for billing spikes, along with identifying their causes.
Another example is Turbonomic that uses an AI-based decision engine that not only continually monitors instances, but also recommends reserved instances whenever appropriate. CloudSqueeze is a cloud “management” vendor that helps users cut costs by employing deep learning to predict resource demands. While the big cloud providers are yet to provide similar offerings, since such services would obviously be counterproductive, customer demand remains high.
Now, as opposed to cloud resources that are rather “elastic” by nature, on-prem nodes are very real and brittle. This is why when we talk about Kubernetes cost optimization on-prem, it’s typically with regard to ensuring there is enough juice to meet peak requirements without overdoing it. For high availability on-premise, using redundant instances of all major components is generally considered best practice. These components include the API server, etcd, controller manager and scheduler.
The recoverability of the etcd cluster is a priority with on-premise clusters. A separate five-node etcd cluster is recommended for production environments. A hypervisor is also recommended over the traditional approach of configuring Linux scheduler priorities. This is because it allows for planned resource consumption governance when dealing with single or dual-host deployments, in addition to better workload isolation. With on-premise deployments, in particular, Hypervisors serve to isolate system components and reserve resources, which is critical to protect nodes from exhaustion.
Additionally, disk performance also greatly impacts failed node recovery time, so SSDs are recommended, along with redundant storage and error-correcting memory. With Kubernetes on-premise, it’s important to remember that you’re not in the cloud and special care has to be taken to avoid overload.
Calculating the cost
In conclusion, with combinations of complex services operating at different scales in different regions with different implications, cloud pricing is anything but straightforward.
The good news is that Kubernetes is a scheduler by nature, and optimizing Kubernetes costs in the cloud or on-premise is all based on how well you know your own requirements. While Grafana and Prometheus are great ways to tackle the problem yourself, cost-calculating open-source tools like Kubecost helps make the process even simpler.
A calculator is only as good as the input it’s fed, however; so again, it comes down to how good your information is.
The only thing growing faster than Kubernetes is AI, which is definitely going to be a big part of how we manage our Kubernetes infrastructure in the future.