In this post you will learn:
- Why Run Kubernetes On-premises
- Challenges Running Kubernetes On-premises
- Best Practices for On-premise Kubernetes Implementation
- Kubernetes in Production Needs
- Additional Services
The best Kubernetes architecture for your organization depends on your needs and goals. Kubernetes is often described as a cloud-native technology, and it certainly qualifies as one. However, the cloud-native concept does not exclude the use of on-premises infrastructure in cases where it makes sense. Depending on your organization’s needs regarding compliance, locality, current architecture, and cost for running your workloads, there may be significant advantages to running Kubernetes deployments on-premises.
Kubernetes has achieved an unprecedented adoption rate, due in part to the fact that it substantially simplifies the deployment and management of microservices. Almost equally important is that it allows users who are unable to utilize the public cloud to operate in a “cloud-like” environment. It does this by decoupling dependencies and abstracting infrastructure away from your application stack, giving you the portability and the scalability that are associated with cloud-native applications.
Why Run Kubernetes On-premises
Why do organizations choose to run Kubernetes in their own data centers, compared to the relative “cake-walk” with public cloud providers? There are typically a few important reasons why an enterprise may choose to invest in a Kubernetes on-premises strategy:
1. Compliance & Data Privacy
Some organizations simply can’t use the public cloud, as they are bound by stringent regulations related to compliance and data privacy issues. For example, the GDPR compliance rules may prevent enterprises from serving customers in the European region using services hosted in certain public clouds.
2. Business Policy Reasons
Business policy needs, such as having to run your workloads at specific geographical locations, may make it difficult to use public clouds. Additionally, some enterprises may not be able to utilize public cloud offerings from a specific cloud provider due to their business policies related to competition.
3. Being Cloud Agnostic to Avoid Lock-in
Many enterprises may not wish to be tied to a single cloud provider and hence may want to deploy their applications across multiple clouds, including an on-premises private cloud. This could potentially reduce business continuity risk due to issues with a specific cloud provider. It also gives you leverage around price negotiation with your cloud providers.
Cost is probably the most important reason to run Kubernetes on-premises. Running all of your applications in the public cloud can get expensive at scale. Specifically, if your applications rely on ingesting and processing large amounts of data, such as with an AI/ML application, a public cloud can get extremely expensive. If you have existing data centers on-premises or in a co-location-hosted facility, running Kubernetes on-premises can be an effective way to reduce your operational costs.
According to a 2021 report from a16z, “It’s becoming evident that while cloud clearly delivers on its promise early on in a company’s journey, the pressure it puts on margins can start to outweigh the benefits, as a company scales and growth slows. Because this shift happens later in a company’s life, it is difficult to reverse as it’s a result of years of development focused on new features and not infrastructure optimization.”
An effective Kubernetes strategy running on-premises in your own data centers can be used to transform your business and modernize your applications for cloud-native – while improving infrastructure utilization and saving costs at the same time.
Challenges Running Kubernetes On-premises
There is a downside to running Kubernetes on-premises, however. Do-It-Yourself (DIY), or self-managed, Kubernetes is known for its steep learning curve and operational complexity. When using Kubernetes on AWS or Azure, your public cloud provider essentially abstracts the complexities from you. Running Kubernetes on-premises means you’re on your own. Here are specific areas where this challenge can be most apparent:
- Etcd – Manage highly available etcd cluster. You need to take frequent backups to ensure business continuity in case the cluster goes down, and the etcd data is lost.
- Load balancing – Load balancing may be needed both for your cluster master nodes and your application services running on Kubernetes. Depending on your existing networking setup, you may want to use a load balancer such as F5 or use a software load balancer such as metallb.
- Availability – It’s critical to ensure that your Kubernetes infrastructure is highly available and can withstand data center and infrastructure downtimes. This would mean having multiple master nodes per cluster, and, when relevant, having multiple Kubernetes clusters across different availability zones.
- Auto-scaling – Auto-scaling based on workload needs can help save resources. This is difficult to achieve for bare metal Kubernetes clusters unless you are using a bare metal automation platform such as open-source Ironic or Platform9’s Managed Bare Metal.
- Networking – Networking is very specific to your data center configuration.
- Persistent storage – The majority of your production workloads running on Kubernetes will require persistent storage – block or file storage. The good news is that most of the popular enterprise storage vendors have CSI plugins and supported integrations with Kubernetes. You will need to work with your storage vendor to identify the right plugin and install any needed components before you can integrate your existing storage solution with Kubernetes on-premises.
- Upgrades – You will need to upgrade your clusters roughly every 3 months when a new upstream version of Kubernetes is released. The version upgrade may create issues if there are API incompatibilities introduced with a newer version. A staged upgrading strategy, where your development/test clusters are upgraded first before upgrading your production clusters, is recommended.
- Monitoring – You will need to invest in tooling to monitor the health of your Kubernetes clusters in your on-premise Kubernetes environment. Most monitoring and log management tools have specific capabilities around K8s monitoring. If you are already using Datadog, Splunk, or similar tools, you’ll have the ability to monitor your Kubernetes on-prem implementation. Or you may consider investing in an open-source monitoring stack designed to help you monitor Kubernetes clusters, such as Prometheus and Grafana.
Best Practices for Kubernetes On-premises
Below you will find a set of best practices to run Kubernetes on-premises. Depending on your environment configuration, some or all of these may apply to you.
Integrating with Existing Environment
Kubernetes enables users to run clusters on diverse of infrastructure on-premises. So you can repurpose your environment to integrate with Kubernetes, using virtual machines or creating your own cluster from scratch on bare metal. But to do this, you would need to build a deep understanding of the specifics of deploying Kubernetes in your existing environment, including your servers, storage systems, and networking infrastructure, to get a well-configured production K8s environment.
The three most popular ways to deploy Kubernetes on-premises are:
- Virtual machines on your existing VMware vSphere environment
- Linux physical servers running Ubuntu, CentOS, or RHEL Linux
- Virtual machines on other types of IaaS environments on-premises, such as OpenStack.
Running Kubernetes on physical servers can give you native hardware performance which may be critical for certain types of workloads. However, it may limit your ability to quickly scale your infrastructure. If getting bare metal performance is important to you, and if you need to run Kubernetes clusters at scale, then consider investing in a bare metal automation platform such as Ironic , Metal3, or a managed bare metal stack such as Platform9 Managed Bare Metal.
Running Kubernetes on virtual machines in your private cloud on VMware or KVM can give you the elasticity of the cloud, as you can dynamically scale your Kubernetes clusters up or down based on workload demand. Clusters created on virtual machines are also easy to set up and tear down, making it easy to create ephemeral test environments for developers.
Staffing Your Team
The Cloud Native Computing Foundation (CNCF) has introduced certifications like Certified Kubernetes Administrator (CKA) and Certified Kubernetes Application Developer (CKAD). The certifications are a good way to assess one’s Kubernetes skills. A great way to ensure that you have the right skills for your on-premise Kubernetes implementations is to train or hire team members with these certifications.
You should also plan for a DIY enterprise Kubernetes project to balloon to months-long (and even years-long) projects while trying to tame and effectively manage the open-source components at scale. If not appropriately planned for, this can accumulate costs and delay time to market.
For a test deployment, Kubernetes can run on one server that can act as both a master and a worker node for the cluster. But to run a meaningful application in practice, you will need at least three servers: one for all the master components, which include all the control plane components like the kube-apiserver, etcd, kube-scheduler and kube-controller-manager, and two for the worker nodes where you’ll run kubelet.
- While master components can run on any machine, best practice dictates using a separate set of servers for the master nodes and not running any of your application containers on these machines.
- One key feature of Kubernetes is the ability to recover from failures without losing data. It does this with a ‘political’ system of leaders, elections, and terms – referred to as quorum – which requires “good” hardware to properly fulfill this capability. To be both available and recoverable, it’s recommended that you allocate three nodes as master nodes with 4GB RAM and 16GB SSD each to this task, with three being the bare minimum and seven being the maximum for master nodes.
- An SSD is recommended here since etcd writes to disk, and the smallest delay can adversely affect performance. Lastly, always have an odd number of cluster members so a majority can be reached.
- For production environments, you need a dedicated HAProxy load balancer node, as well as a client machine, to run automation.
- It’s also a good idea to get substantially more power than what Kubernetes’ minimum requirements call for. Modern Kubernetes servers typically feature two CPUs with 32 cores each, 2TB of error-correcting RAM, and at least four SSDs, eight SATA SSDs, and a couple of 10G network cards.
- It is best practice to run your clusters in a multi-master fashion in production to ensure high availability and resiliency of the master components themselves. This means you’ll need at least 3 Master nodes (an odd number, to ensure quorum). You’ll further need to monitor the master(s) and fix any issues in case one of the replicas are down.
etcd is an open-source distributed key-value store and the persistent storage for Kubernetes. Kubernetes uses etcd to store all cluster-related data. This includes all the information that exists on your pods, nodes, and cluster. Accounting for this store is mission-critical, to say the least, since it’s the last line of defense in case of cluster failure. Managing highly available, secured etcd clusters for large-scale production deployments is one of the key operational complexities you need to handle when managing Kubernetes on your own infrastructure.
For production use, where availability and redundancy are important factors, running etcd as a cluster is critical. Bringing up a secure etcd cluster – particularly on-premises – involves downloading the right binaries, writing the initial cluster configuration on each etcd node, and setting and bringing up etcd. This is in addition to configuring the certificate authority and certificates for secure connections. For an easier way to run etcd cluster on-prem, check out the open-source etcdadm tool.
If you are deploying offline or in an air-gapped environment, you’ll need to have your own repositories in place for docker, Kubernetes, and any other open-source tools you may be using. This includes helm chart repositories for Kubernetes manifests, as well as binary repositories.
Storage and Networking
Keep in mind that when running Kubernetes in your own data center on-premises, you will need to manage all of the storage integrations, load balancers, and DNS.
In addition, each one of these components – from storage to networking – needs its own monitoring and alerting systems, and you will need to set up your internal processes to monitor, troubleshoot and fix any common issues that might arise in these related services to ensure the health of your environments.
A container registry enables you to store container images for your applications in a secure and highly available manner. Even when deploying Kubernetes clusters on-premises, you could use hosted registry options such as ECR, docker hub, etc. If your container registry must be hosted on-premises, open-source Harbor is a good option, although you must assess the complexity involved in deploying your own registry.
You also definitely want to install the Kubernetes dashboard, which is one of the most useful and popular add-ons. The dashboard is not installed by default and must be configured separately. Once installed, the dashboard can provide great visibility into all your containerized workloads deployed on your cluster. It will also let you access container logs that can help with debugging.
Best practices include always checking logs when something goes wrong by looking in your syslog files.
This stage can be a lot of fun since you get to experiment with all the tools in the industry, or a major pain — depending on your infrastructure and processes complexity.
Weaveworks and Flannel are both great networking tools, while Istio and Linkerd are popular service mesh options. Grafana and Prometheus help with monitoring and there are a number of tools to automate CI/CD like Jenkins, Bamboo, and JenkinsX.
Security is a major concern. Every open source component needs to be scanned for threats and vulnerabilities. Additionally, keeping track of version updates and patches and then managing their introduction can be labor-intensive, especially if you have a lot of additional services running.
Note that bare-bone Kubernetes is never enough for real-world production applications. A complete Kubernetes infrastructure on-prem needs proper DNS, load balancing, Ingress and K8’s role-based access control (RBAC), alongside a slew of additional components that then makes the deployment process quite daunting for IT.
Once Kubernetes is deployed comes the addition of monitoring, tracing, logging, and all the associated operations for troubleshooting — such as when running out of capacity, ensuring HA, backups, and more.
In conclusion, Kubernetes helps on-premise data centers benefit from cloud-native applications and infrastructure, irrespective of hosting or public cloud providers. They could be on Openstack, KVM, VMware vSphere or even bare metal and still reap the cloud-native benefits that come from integrating with Kubernetes.
Kubernetes On-Premises With Platform9
Platform9 Managed Kubernetes (PMK) addresses a number of the above best practices in a single, easy to use container management and orchestration platform that lets you manage Kubernetes clusters on any infrastructure anywhere. Check out our PMK page for more details on PMK features. You can also find more information about PMK including useful product demo videos here Getting started is easy.
Kubernetes on Bare Metal: Why and How
This post dives deeper into details of benefits of running Kubernetes on bare metal, comparison of running Kubernetes on bare metal vs virtual machines, and additional details.
Read more: Kubernetes on Bare Metal: Why and How
7 Key Considerations for Kubernetes in Production
A complete Kubernetes infrastructure needs proper DNS, load balancing, Ingress and Kubernetes role-based access control (RBAC), alongside a slew of additional components that then makes the deployment process quite daunting for IT. Once Kubernetes is deployed comes the addition of monitoring and all the associated operations playbooks to fix problems as they occur — such as when running out of capacity, ensuring HA, backups, and more. Finally, the cycle repeats again, whenever there’s a new version of Kubernetes released by the community, and your production clusters need to be upgraded without risking any application downtime.
Bare-bone Kubernetes is never enough for real-world production applications. In this blog post you’ll learn 7 Key Considerations for Kubernetes in Production.
Kubernetes Upgrade: The Definitive Guide to Do-It-Yourself
Often you are required to upgrade the Kubernetes cluster to keep up with the latest security features and bug fixes, as well as benefit from new features being released on an on-going basis. This is especially important when you have installed a really outdated version or if you want to automate the process and always be on top of the latest supported version.
In general, when operating an HA Kubernetes Cluster, the upgrade process involves two separate tasks which may not overlap or be performed simultaneously: upgrading the Kubernetes Cluster; and, if needed, upgrading the etcd cluster which is the distributed key-value backing store of Kubernetes. In this blog post you’ll see how to perform those tasks with minimal disruptions.
Top Considerations for Migrating Kubernetes Across Platforms
Migration Kubernetes may include moving from one public cloud vendor to another; from a private data center to the cloud or vice-versa; from a data center or cloud to a colocation facility; or across private data centers. It could be a wholesale, one-time migration of your application to a new environment or a dynamic and ongoing migration between environments. Regardless of target, strategy, or reason, migration requires careful consideration and you’ll benefit through the use of third-party tools and managed platforms. There are many considerations in terms of data, differences in connectivity, cloud vendors, platform or bare-metal services, and so on.
Interested in More Content?
- Beyond Kubernetes Operations: Discover Platform9’s Always-On Assurance™ - November 29, 2023
- Platform9 Introduces Elastic Machine Pool (EMP) at KubeCon 2023 to Optimize Costs on AWS EKS - November 15, 2023
- KubeCon 2023 Through Platform9’s Lens: Key Takeaways and Innovative Demos - November 14, 2023