Enterprise Kubernetes Guide: Everything you need to run Kubernetes in production

This is an excerpt from The Gorilla Guide to Kubernetes in the Enterprise, written by Joep Piscaer.

You can download the full guide here.

Software delivery has evolved

The way we build and run applications has changed dramatically over the years. Traditionally, apps ran on top of physical machines. Those machines eventually became virtual. In both cases, the application and all its dependencies were installed on top of an OS.

This relationship between OS and applications created a tightly-coupled bundle of everything needed to run that application. Each virtual machine (VM) ran a complete OS, no matter how big or small the VM was, or how demanding the application on top.

Each OS provided a complete execution environment for applications: this included binaries, libraries and services, as well as compute, storage, and networking resources.

Drawbacks of this approach are the inherent size and volume of VMs. Each OS is many gigabytes in size, which not only requires storage space, but also increases the memory footprint.

This size and tight coupling results in a number of complexities in the VM lifecycle and the applications running on top. Without a good way of separating different layers in a VM (OS, libraries, services, application binaries, configuration, and data), swapping out different parts in this layer cake is nearly impossible. For this reason, once a VM is built, configured and running, it usually lives on for months or years. This leads to pollution and irreversible entangling of the VM in terms of OS, data, and configuration.

New versions of the OS, its components, and other software inside the VM are layered on top of the older version. Because of this, each inplace upgrade creates potential version conflicts, stability problems, and ballooning of uncleaned recent versions on disk. Maintaining this ever-increasing complexity is a major operational pain point, and often leads to downtime.

This places an unbalanced operational focus on the OS and underlying layers, instead of the place it should be: the application.

Operational friction, an unnecessarily large and perennial operating environment, and lack of decoupling between layers are all in sharp contrast with how lean and agile software development works. It’s no surprise, then, that the traditional approach doesn’t work for modern software development.

In the new paradigm, developers actively break down work into smaller chunks, create (single-piece) flow, and take control and ownership over the pipeline that brings code from local testing all the way to production. Containers, microservices and cloud-native application design are facilitating this.

The Benefits of Creating Cloud Native Applications

Let’s break down how these technologies enable modern software development methodologies.

Containers

First and foremost, containers package up only the parts of the application unique to that container, like the business logic. Containers share the underlying OS and often common libraries, frameworks, or other pieces of middleware. This results in much lighter packages (containers are usually megabytes, instead of the gigabytes that are typical with VMs), and are clearly decoupled from the layer cake underneath.

Because of this decoupling, a new one-to-one relationship between the container image and the application unlocks the full benefits of containers.

A container can be spun up on different hosts, clusters or clouds without any change to the container or its definition. Decoupling from the OS underneath makes it simpler to maintain those underlying layers. The OS becomes a commodity to developers: a black box layer that just works. Developers no longer have to think about that layer.

This allows easier and automated updating and changing of the layers underneath. Because the layers are decoupled, production systems are rarely patched or updated. The new version of the OS is deployed fresh, and the old system with the old version is discarded.

The same goes for a new version of the application inside the container: instead of updating the container, a new container with the new version is deployed, and traffic is diverted to that new container. The old one is killed as soon as the new container is operating correctly.

This approach is called ‘immutable infrastructure,’ defined as a clearer separation between the application, operating system, and the underlying infrastructure. This allows easier and more independent changes in each layer. Operationally, this makes a world of difference as different teams can take more ownership and responsibility of each layer.

With this decoupling comes a new interface between the OS and container, giving developers freedom to deploy new versions of their applications without intervention from the teams managing the layers underneath. This gives developers more control over when to deploy what to production. Rolling back a bad release or redirecting more traffic to a new version is a simple task, without friction or dependency on the infrastructure or operations teams.

In turn, the infrastructure and operations teams can take more control over their parts of the layer cake, enabling paradigms like Infrastructure-as-Code that allow treating infrastructure as a software development problem. This enables solutions like creating declarative code that instructs a pipeline of infrastructure automation software how to create and configure infrastructure.

Cloud-native Services

While containers are a great fit for custom business logic and code, many of the moving parts of an application stack are standard and common components. Instead of re-inventing the wheel, using commercially available and/or open source software for those components makes sense. Other than a few niche and extreme use cases, why build your own database engine, caching layer or web server?

That’s why many public cloud providers offer those components and middleware as a service; the goal is to make consumption as frictionless as possible. Developers can simply configure the entire software stack with a few clicks, using databases, proxies, web servers, message queues and much more.

But cloud-native means more than simply consuming existing technology as a service. The Cloud Native Computing Foundation, or CNCF for short, defines “cloud native” as follows:

Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.

This definition puts the focus on more than just a set of technological tools. It encompasses business outcomes like scalability, dynamic behavior, and resiliency; standards regarding certain patterns of methodology and design like immutability and frequent changes; and a focus on operational excellence with abilities like decoupling, observability, and automation.

It’s this comprehensive approach that makes cloud-native so appealing: it’s not just about technology, but about how tech is used within organizations, and what outcomes are achieved.

This creates an integrated ecosystem of products that checks all the boxes of CNCF’s definition, and which organizations can use to hit the ground running. As such, it eliminates much of the groundwork processes like design, integration, and implementation that otherwise takes a lot of time.

CNCF’s biggest and highest-velocity projects are integrated and broad, including:

Kubernetes is a container orchestration platform that helps users build, scale and manage modern applications and their dynamic lifecycles. The cluster scheduler capability lets developers focus on code rather than ops. Kubernetes future-proofs infrastructure management on-premises or in the cloud, without vendor or cloud provider lock-in.

Prometheus delivers real-time monitoring, alerting, and time series database capabilities (including powerful queries and visualizations) for cloud-native applications.It’s the de facto standard for monitoring container-based infrastructure. Prometheus provides needed visibility into, and troubleshooting for, cloud-native architectures.

Envoy is a distributed proxy designed for single services and applications, as well as a universal data plane designed for large microservice service mesh architectures. Envoy runs alongside every application, and abstracts the network by providing common features in a platform-agnostic manner. It’s easy to visualize problem areas via consistent observability, tune overall performance, and add substrate features in a single place.

CoreDNS is a DNS server, written in Go. It can be used in a multitude of environments because of its flexibility.

Besides these four, there are many additional projects that are relevant to Kubernetes in 2019. The most notable include:

Fluentd. This is a unified logging tool that helps users better understand what’s happening in their environments by providing a unified layer for collecting, filtering, and routing log data.
NATS. This is a simple, high-performance open source message queueing and publish/subscribe system for cloud-native applications.
gRPC. This is a high-performance, open source universal RPC framework.
Containerd. This is an industry-standard container runtime with an emphasis on simplicity, robustness and portability.
Linkerd. An ultralight service mesh for Kubernetes and beyond, Linkerd provides observability, reliability, and security for microservices, with no code change required.
CNI. The Container Network Interface provides networking for Linux containers.
CSI. This stands for Container Storage Interface. It provides storage for Linux containers. See more on Kubernetes Storage and CSI.
Helm. This is the package manager for Kubernetes. Helm is the best way to find, share, and use software built for Kubernetes.

Of course, there are numerous software projects not part of the CNCF that fit into the ecosystem very well. Examples include Istio, the popular service mesh, and Terraform, the composable infrastructure automation tool.

Why Kubernetes

Let’s look at the CNCF’s most popular project, Kubernetes.

The Kubernetes layer cake of infrastructure, containers and applications. — The Kubernetes layer cake of infrastructure, containers, and applications.

Kubernetes is the orchestration layer that manages containers across a group of physical servers or VMs. Kubernetes is specifically designed to manage the ephemeral nature of thousands of containers spinning up, scaling up, and winding down.

Kubernetes manages versioning of containers, figures out how containers can talk to each other over the network, exposes services running inside containers, and handles storage considerations. It also deals with failed hardware, and maintaining container availability.

Kubernetes makes it easy to quickly ramp up container instances to match spikes in demand. New versions can be put into production in small increments (these are known as canary deployments.)

Kubernetes can be thought of as a container-centric computing platform. It has much of the flexibility of Infrastructure-as-a-Service (in terms of managing compute, storage and networking resources), with the developer-friendly workflows and constructs found in Platforas-a-Service on top. These include deployment, scaling, load balancing, logging, monitoring, and composition of application containers across clusters of container hosts.

Kubernetes is more than just a container orchestrator or resource scheduler. On the infrastructure side, it aims to remove the toil of orchestrating compute, network, and storage resources. It also abstracts those constructs so application developers and operators can focus entirely on container-centric workflows and self-service operation.

On the container side, Kubernetes provides a platform for building customized workflows and higher-level automation. It integrates into the continuous integration/continuous delivery (CI/CD) pipelines developers use to bring code into production in a controlled, tested and automated fashion.

The platform brings together infrastructure operations and software development by design. It uses declarative, infrastructure-agnostic constructs to describe applications and how they interact, without the traditional close ties into the underlying infrastructure.

Kubernetes runs just as well on traditional on-premises infrastructure stacks as it does for third-party service providers and public cloud environments.

Developer Agility

We’ve seen that containers unlock the full benefits of agile software development and operations. Creating smaller, portable container images that contain only the application increases developer velocity and the speed through the pipeline into production, which massively reduces the inertia of each release.

Creating “flow” is one of the core principles of agile software development, and reducing the size of the piece of code moving through the developer’s delivery pipeline without being blocked is critical.

Containers are a major reduction in size compared to VMs, and help developers push code to production in smaller increments, and more often. This limits the impact of mistakes, as any changes causing the mistake will be small; this makes them quick and easy to roll back, due to image immutability. Developers can simply roll back to a previous version, without having to worry very much about data consistency or data loss.

A major cause of mistakes in production is the lack of environmental consistency across development and production environments. With containers, the image is identical and immutable, no matter where it runs; this is true even if the underlying resources differ massively. So, if it runs on the developer’s laptop, it will run in production.

A common blocker of the pipeline is the separation of concerns between development and operations. This typically leads to a dependency of the developer on the Ops team to install the new application version during deployment, often by using configuration management tooling like Chef or a package manager.

With containers, images are built automatically at build/release time and deployed as an atomic unit. This allows Ops to influence how the images are built asynchronous to the deployment, while developers have full control during deployment. In a container configuration, dependencies are added as lines of code and either specify a specific version of that dependency, or depend on the latest version at build time. This helps in managing security breaches and keeping code secure (and lean), as dependencies are updated automatically and often.

While Kubernetes and the common underlying container runtime themselves don’t deploy source code or build your application, they’re easily integrated into CI/CD workflows and pipelines.

Cost Management

Similar to the move from physical to virtual servers, moving to containers optimizes resource usage. This lowers the cost of each application, as it runs more efficiently. As discussed before, a major difference between a VM and a container is its relative size: a container is magnitudes smaller than a VM. This makes it nimbler and more flexible, especially from a cost perspective. This allows the container to run where it’s cheaper, an important consideration in ephemeral compute instances where the application is non-production or resilient itself.

Secondly, more but smaller containers are more easily scheduled across multiple hosts as compared to fewer but bigger VMs. This is called the “bin-packing problem.”

The dynamic nature of containers in a Kubernetes cluster, utilizing the Horizontal Pod Autoscaler, means that application cost goes hand-in-hand with application demand. While this is fantastic for scalability, it can sometimes have unintended consequences on the budget. The plethora of options muddies the waters pretty quickly. Even with the relatively simple cost model of physical servers, assigning a fraction of cost to a certain team, department, or application is difficult. Add in the complex offering of public cloud instance types, and it becomes near impossible to assign cost.

There are some solutions for cost control, like CloudHealth, CoreOS Operator Framework, and Platform9’s Arbitrage that help assign cost across the multitude of layers in Kubernetes and the underlying public cloud or on-premises platform. These solutions figure out the charges for consumed infrastructure cost and assign them to clusters, namespaces, and pods inside Kubernetes. Besides the pods that run the actual applications, these solutions also split pods into administrative, monitoring, logging, and idle resources.

But in reality, many people apply the ‘guesstimate’ method, especially in the early phases of containerization projects. And however unscientific it is, this method does fit in with the reasoning behind the move toward containers and developer agility: create flow, increase velocity, and remove hurdles in their pipeline to production.

Only after implementation does cost control start to matter. The tangible benefits of the system have started to manifest in day-to-day operations; after that, the downsides, including cost sprawl, need to be reined in, but only after it’s proven successful.

And here lies the true cost/benefit analysis: it’s not just about controlling infrastructure costs, but developer costs, too: how much quicker can they move to production or roll back a faulty release, for instance, and what financial consequences, good or bad, does that have?

This brings us to the fundamental value of agility: smaller iterations of work. This means going through the “discover-plan-build-review” cycle much, much more often. Optimizing the developer’s flow makes them more efficient, which in turn makes them less expensive. As more and more companies invest in software development, the cost balance is shifting from infrastructure to developer; given this, it makes more sense to optimize the higher-cost items.

Accelerate Project Timelines

For many developers, Kubernetes means ‘less friction.’ A production-grade Kubernetes platform usually includes monitoring, logging, tracing, release management for blue/green or canary deployments, automated testing in the pipeline, and automated deployment.

All of these reduce friction, making it cheap and easy to deploy software to production. This means less management overhead and associated processes, including approvals, change advisory boards, and release/deployment managers.

This is especially true for development in microservices environments, where boundaries between teams are carried forward in the services and products they deliver. These microservices are loosely-coupled, small and independent pieces of a larger network of services that make up an application. All these services can be deployed and managed independently and dynamically, making it easier for a team to put a new piece of code into production without
dependence on another team.

This gives teams the freedom to decide if they want to bring in an existing (paid-for) solution, or if they’ll build it themselves. While existing solutions may be more expensive up front, the delivery timeframe is usually compressed significantly.

A Note on Kubernetes for Stateful Applications:

Yes, Containers Can Be Stateful, too.

In the earlier stages of Kubernetes and container maturity, it was often believed that containers were only suitable for stateless workloads, and that storing any data or state in a container was impossible. This belief is wrong; both the underlying container runtime (which is often Docker) and Kubernetes fully support a diverse variety of workloads, including stateful applications. Containers themselves are ephemeral and immutable, meaning that
any file system changes are lost after the container shuts down. But there are plenty of options for adding stateful storage to a container, ranging from NFS network shares to S3 object stores and full-fledged data center storage options like a SAN. Many organizations deploying Kubernetes actually use existing storage assets for stateful storage. Another popular storage option is a hyperconverged storage deployment pattern like the open source CEPH or VMware’s VSAN.

Learn more about the key concepts and components in the Kubernetes storage architecture, and also check out this webinar on Kubernetes for stateful applications.

Learn More About Enterprise Kubernetes

Today Enterprise IT does not question the value of containerized applications anymore. Given the move to adopting DevOps and cloud-native architectures, it is critical to leverage container capabilities in order to enable digital transformation. Google’s Kubernetes (K8s), an open source container orchestration system, has become the de facto standard — and the key enabler — for cloud-native applications, and the way they are architected, composed, deployed, and managed. Enterprises are using Kubernetes to create modern architectures composed of microservices and serverless functions which scale seamlessly.

Our additional articles below can help you learn more about how to evaluate, implement, and optimize your multicloud storage investment.

Best Practices for Production-Grade Kubernetes

Kubernetes is a complex platform that provides for highly scalable, efficient use of containers. But it can also be highly problematic for companies that just try to “wing it” and figure out what to do as they go along.

Don’t let that be you. Here are some best practices to employ when running Kubernetes in production.

Key Features to Consider When Evaluating an Enterprise Kubernetes Solution

Kubernetes is notoriously difficult to deploy and operate at scale — particularly for enterprises managing both on-premises and public cloud infrastructure. Numerous Kubernetes solutions and products have emerged in the industry (from both startups and established traditional vendors) aimed to solve some of the challenges around Kubernetes. The space has become crowded, and difficult for organizations to navigate and compare the various offerings.

In this blog you’ll learn about 18 technical and operational capabilities to consider when evaluating various solutions for enabling Kubernetes at scale in the enterprise.

7 Key Considerations for Kubernetes in Production

A complete Kubernetes infrastructure needs proper DNS, load balancing, Ingress and K8’s role-based access control (RBAC), alongside a slew of additional components that then makes the deployment process quite daunting for IT. Once Kubernetes is deployed comes the addition of monitoring and all the associated operations playbooks to fix problems as they occur — such as when running out of capacity, ensuring HA, backups, and more. Finally, the cycle repeats again, whenever there’s a new version of Kubernetes released by the community, and your production clusters need to be upgraded without risking any application downtime.

Bare-bone Kubernetes is never enough for real-world production applications. Learn about 7 key services you need around bare-bone Kubernetes to enable mission-critical production use.

Production considerations for Multi-Master Kubernetes

Kubernetes has been around for five years and, at this point, has become a stable platform that is commonly used throughout development and production environments to run applications. In the time it has been a project, the base project has become more capable. It makes it easier to build and deploy a reliable cluster on your own, and then add in the components you want, so it meets your individual requirements.

In this blog post you’ll learn about:

Clustering High-Availability etcd
Creating a new etcd database cluster
Adding HA to Existing etcd Database
Going to a Multi-Master Configuration

Kubernetes Resource Limits: Kubernetes Capacity Planning

Capacity planning is a critical step in successfully building and deploying a stable and cost-effective infrastructure. The need for proper resource planning is amplified within a Kubernetes cluster, as it does hard checks and will kill and move workloads around without hesitation and based on nothing but current resource usage.

This article will highlight areas that are important to consider, such as: how many DaemonSets are deployed, if a service mesh is involved, and if quotas are being actively used. Focusing on these areas when capacity planning makes it much easier to calculate the minimum requirements for a cluster that will allow everything to run.

Kubernetes Cluster Sizing – How Large Should a Kubernetes Cluster Be?

When it comes to Kubernetes clusters, size matters. The number of nodes in your cluster plays an important role in determining the overall availability and performance of your workloads. So does the number of namespaces, in a way.

This does not mean, however, that bigger is always better. A Kubernetes cluster sizing strategy that aims to maximize node count will not always deliver the best results – certainly not from a cost perspective, and perhaps not from an overall availability or performance perspective, either. And maximizing namespaces is hardly ever a smart strategy.

Instead, calculating the number of nodes to include in a cluster requires careful consideration of a variety of factors. Keep reading for an overview – if not a precise recommendation on how large your cluster should be, because only you can decide that for yourself.

Kubernetes for CI/CD at Scale

one of the main use cases of Kubernetes is to run Continuous Integration or Continuous Delivery (CI/CD) pipelines. That is, we deploy a unique instance of a CI/CD container that will monitor a code version control system, so whenever we push to that repository, the container will run pipeline steps. The end goal is to achieve a ‘true or false’ status. True, if the commit passes the various tests in the Integration phase; false, if it does not.

In this blog post you’ll learn about,

CI/CD platforms for Kubernetes
How to Install Jenkins on Kubernetes
Scaling CI/CD Jenkins Pipelines with Kubernetes
Best Practices to use Kubernetes for CI/CD at scale

Read more: Kubernetes for CI/CD at scale

Kubernetes Security: Architecture & Best Practices

When it comes to security, there is a lot that Kubernetes does. There is also a lot that it doesn’t do.

To secure Kubernetes effectively for real-world deployment, you must understand which built-in security features Kubernetes offers and which it doesn’t, and how to leverage Kubernetes’s security capabilities at scale.

In this blog post you’ll learn Kubernetes’s security architecture and best practices for securing production Kubernetes deployments.

Kubernetes Upgrade: The Definitive Guide to Do-It-Yourself

Often you are required to upgrade the Kubernetes cluster to keep up with the latest security features and bug fixes, as well as benefit from new features being released on an on-going basis. This is especially important when you have installed a really outdated version or if you want to automate the process and always be on top of the latest supported version.

In general, when operating an HA Kubernetes Cluster, the upgrade process involves two separate tasks which may not overlap or be performed simultaneously: upgrading the Kubernetes Cluster; and, if needed, upgrading the etcd cluster which is the distributed key-value backing store of Kubernetes. In this blog post you’ll see how to perform those tasks with minimal disruptions.

Kubernetes infrastructure cost optimization

Kubernetes costs can vary considerably in the enterprise. Depending on whether you decide to host your clusters on the public cloud services – such as Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS), Amazon Elastic Kubernetes Services (EKS) – or on-premise, there are a number of ways to ensure you are spending your money efficiently.

In this post, you’ll learn about the major factors that drive up Kubernetes infrastructural costs – like AI, Node type, size, and density – as well as optimizing on-prem resources to meet modern requirements.

There’s More:

In the next posts we’ll dive deeper into the Kubernetes architecture, how to deploy Kubernetes on different types of infrastructure, Kubernetes use cases, and best practices for operating Kubernetes in Production, at scale.

Can’t wait?

To learn more about Kubernetes in the Enterprise, download the complete guide now.

Author

Platform9

Platform9 is a leader in simplifying enterprise private clouds. Our flagship product, Private Cloud Director, turns existing infrastructure into a full-featured private cloud. Enterprise IT teams can manage VMs and containers with familiar GUI tools and automated APIs in a private, secure environment.

View all posts

Enterprise Kubernetes Guide: The Changing Development Landscape

Software delivery has evolved

The Benefits of Creating Cloud Native Applications

Containers

Cloud-native Services

CNCF’s biggest and highest-velocity projects are integrated and broad, including:

Why Kubernetes

Developer Agility

Cost Management

Accelerate Project Timelines

A Note on Kubernetes for Stateful Applications:

Learn More About Enterprise Kubernetes

Best Practices for Production-Grade Kubernetes

Key Features to Consider When Evaluating an Enterprise Kubernetes Solution

7 Key Considerations for Kubernetes in Production

Production considerations for Multi-Master Kubernetes

Kubernetes Resource Limits: Kubernetes Capacity Planning

Kubernetes Cluster Sizing – How Large Should a Kubernetes Cluster Be?

Kubernetes for CI/CD at Scale

Kubernetes Security: Architecture & Best Practices

Kubernetes Upgrade: The Definitive Guide to Do-It-Yourself

Kubernetes infrastructure cost optimization

There’s More:

Author

Meet Private Cloud Director

Software delivery has evolved

The Benefits of Creating Cloud Native Applications

Containers

Cloud-native Services

CNCF’s biggest and highest-velocity projects are integrated and broad, including:

Why Kubernetes

Developer Agility

Cost Management

Accelerate Project Timelines

A Note on Kubernetes for Stateful Applications:

Learn More About Enterprise Kubernetes

Best Practices for Production-Grade Kubernetes

Key Features to Consider When Evaluating an Enterprise Kubernetes Solution

7 Key Considerations for Kubernetes in Production

Production considerations for Multi-Master Kubernetes

Kubernetes Resource Limits: Kubernetes Capacity Planning

Kubernetes Cluster Sizing – How Large Should a Kubernetes Cluster Be?

Kubernetes for CI/CD at Scale

Kubernetes Security: Architecture & Best Practices

Kubernetes Upgrade: The Definitive Guide to Do-It-Yourself

Kubernetes infrastructure cost optimization

There’s More:

Author

Subscribe now to receive similar posts

Meet Private Cloud Director

Related Posts