Kubernetes Cluster Sizing – How Large Should a Kubernetes Cluster Be?

When it comes to Kubernetes clusters, size matters. The number of nodes in your cluster plays an important role in determining the overall availability and performance of your workloads. So does the number of namespaces, in a way.

This does not mean, however, that bigger is always better. A Kubernetes cluster sizing strategy that aims to maximize node count will not always deliver the best results – certainly not from a cost perspective, and perhaps not from an overall availability or performance perspective, either. And maximizing namespaces is hardly ever a smart strategy.

Instead, calculating the number of nodes to include in a cluster requires careful consideration of a variety of factors. Keep reading for an overview – if not a precise recommendation on how large your cluster should be, because only you can decide that for yourself.

Why Kubernetes Cluster Size Matters

The size of your Kubernetes cluster (in terms of the number of nodes) shapes both performance and availability in critical ways.

With regard to performance, more nodes generally mean better performance. This isn’t because node count itself promotes better performance, but because having more nodes usually means there are more resources available for the cluster to consume. So, node count in this sense is a proxy for performance.

As for availability, node count plays a more direct role in shaping this characteristic. The more nodes you have, the smaller the chance that you’ll experience a node failure so large that it disrupts your cluster’s availability.

Of course, a variety of other factors beyond node count shape performance and availability. Resource allocations among pods and namespaces, network quality, the reliability of your underlying infrastructure and the proximity of nodes to each other on the network (to name just a few factors) also affect performance and availability in significant ways.

Why More Nodes are Not Always Better

You may be tempted to assume that the more nodes you can add to your cluster, the better it will be. That’s not at all the case, for several reasons.

Not All Nodes are Created Equal

First and foremost is the fact that there is a tremendous amount of variation in what constitutes a node.

Some nodes contribute many more hardware resources to the cluster than others, and therefore do more to improve performance. In this respect, the overall node count is a very weak representation of your cluster’s performance. A cluster that has 5,000 nodes (the maximum that Kubernetes can currently support), each with minimal resource allocation, may perform worse than a cluster composed of 100 high-end nodes.

In some cases, some nodes are also more likely to remain available than others. A physical server sitting in your local data center with no power backup is a less reliable node than virtual machines hosted in the cloud (which tends to be a lot more reliable than on-premises infrastructure). Thus, node count is not an exact measure of cluster availability.

Physical vs. Virtual Machine Nodes

Along similar lines, the mix of physical and virtual machines within a Kubernetes cluster impacts its performance and availability in key ways.

In Kubernetes, both physical servers and virtual machines can serve as nodes. Neither is inherently more reliable or higher-performing than the other. However, a cluster that consists of many virtual machine nodes that are running on just a handful of physical servers is unlikely to be as reliable as one where there are more physical servers in the mix. Whether the physical servers are serving directly as nodes, or as hosts for virtual machine nodes, having more physical servers reduces the impact of the failure of any one server.

To put this another way: If you have 100 virtual machine nodes that are hosted on just five physical servers, the failure of a single physical server would reduce your node count by 20 percent. That’s a huge hit, so it’s better to have more physical servers in the mix.

That said, taking things to the opposite extreme is not ideal, either. If you were to make every physical server its own node, the failure of one server would deprive your cluster of the total resources that the server contributed. For availability and performance purposes, it would be better to run at least a few virtual machines on each physical server and have those virtual machines connect to the cluster as nodes. That way, if one of the nodes fails, or simply takes too long to start, only part of the resources of the underlying physical server are lost.

The bottom line: The ratio of physical machines to virtual machines within your cluster affects performance and availability in complex ways. There is no simple formula for finding the right ratio, but you should seek a healthy middle ground.

More Nodes Means More Complexity

It’s worth noting, too, that the more nodes you have, the harder it is to manage and keep track of them all.

Given that so much in Kubernetes is automated, having a large node count is not a huge obstacle in this respect. But it’s still a factor to consider. You will have to provision, monitor and secure every node. If your ability to do these things is limited, that is a consideration in favor of keeping your cluster smaller.

Performance and Availability are Relative

A final fact to remember is that performance and availability are always relative. You’re never going to maximize either, no matter how many nodes you have (or don’t have), or how perfectly your cluster is configured.

I mention this to emphasize that you can end up shooting yourself in the foot if you obsess over maximizing node count. You should aim to achieve levels of performance and availability that are acceptable for your needs, then move on. Beyond this, you end up with diminishing returns on your investment in nodes (not to mention unnecessary complexity to manage).

Rightsizing your Kubernetes Cluster

So, how do you find the sweet spot? How do you make sure you have enough nodes but not too many, and that your mix of physical and virtual machines is just right?

Obviously, there’s no simple or universal answer to that question. You need to consider a variety of factors and their impact on your particular needs.

How Reliable is your Physical Infrastructure?

If the physical infrastructure that forms the foundation for your nodes is ultra reliable, then you can have fewer nodes. Generally speaking, a cloud-based Kubernetes deployment can have fewer nodes than an on-premises one for this reason. (No matter how reliable you think your on-premises data center is, it’s probably not as reliable as the modern cloud.)

How Many Resources Does Each Node Have?

The hardware profiles of your nodes (whether they are physical or virtual hardware) is a key factor, too. From a performance perspective, you don’t need as many nodes if each node offers a relatively high amount of hardware resources.

How Many Master Nodes Do You Have?

When it comes to overall cluster availability and performance, master nodes matter much more than worker nodes. You could have several worker nodes fail and see no major impact. But the failure of a master node could be catastrophic if it is your only master node. Even if it’s not, it will still have a higher impact than the failure of a single worker.

So, consider how many master nodes your cluster contains and perhaps focus on increasing the number of masters before you worry about adding more workers. More masters reduce your need for workers.

How Many Workloads Does Your Cluster Host?

The total number of workloads is a key consideration for determining how large to make your cluster. Although Kubernetes namespaces make it easy to divide clusters into isolated zones for individual workloads (or groups of workloads), there is a point where you are better off simply breaking your cluster into smaller clusters than trying to add more namespaces.

Each namespace adds management overhead. It also increases the challenge of the “noisy neighbor” issue (which can be solved with resource quotas, but you have to set those up manually, so they’re not a scalable solution).

Some Extremely Basic Rules of Thumb for Cluster Sizing

In case you’ve been reading this post looking for specific guidance on how large to make your cluster, let me reiterate that there is nothing close to a one-size-fits-all answer.

Still, I’m willing to make some very basic, exceptionally general, sweepingly oversimplified recommendations:

For a production namespace or cluster, you should have at least one node per container. (This doesn’t mean you should run each container on its own node – au contraire – but that the minimum total number of nodes available to host container instances should be equal to the total containers.)
For each pod in a production namespace or cluster, you should have a physical machine. Whether it runs as its own node or hosts virtual machines that serve as nodes doesn’t matter. The point is to increase the availability of your cluster by having sufficient underlying physical machines.
If your namespaces in a single cluster exceed six, it’s time to think about breaking the cluster out into smaller clusters.

Again, these are very basic rules. (Also, keep in mind, I’d cut my numbers in half if dealing with a dev/test environment, where performance and availability concerns are generally not as great.) Your mileage will certainly vary,but if you wanted some specific numbers, there you have them.

Conclusion

Sizing Kubernetes clusters is more of an art than a science. There is a complex mix of factors at play, ranging from what type of infrastructure is hosting your nodes, to how many master nodes you have set up, to the ratio of physical to virtual machines. Treat the pointers above as very general guidelines, and be prepared to size your cluster based on your specific needs.

Author

Platform9

Platform9 is a leader in simplifying enterprise private clouds. Our flagship product, Private Cloud Director, turns existing infrastructure into a full-featured private cloud. Enterprise IT teams can manage VMs and containers with familiar GUI tools and automated APIs in a private, secure environment.

View all posts