Kubernetes Networking Challenges at Scale

Kubernetes Networking

Kubernetes networking can be noisy, tedious, and complex. This article discusses some of the challenges involved with managing and troubleshooting Kubernetes networking for large-scale production deployments. In fact, in a recent survey, 42% of Kubernetes users list networking as their largest Kubernetes challenge. This number often goes up as deployment size increases. Additionally, according to Gartner, more than 75% of global organizations will be running containerized applications in production by 2022.

Even if you don’t think you have a large-scale deployment now, odds are that your deployment will quickly grow. Without consideration early on, you may build technical debt that gets harder to pay off over time. Let’s explore some of the network challenges to consider now.


This article will cover:

Container Networking Differences

The first challenge of networking with Kubernetes is that it varies enough from traditional networking, yet too many developers try to apply these fundamentals to containers as well. For starters, network addressing is more dynamic with Kubernetes environments – nothing is static – and due to the nature of container migration, classic DHCP can slow things down. You cannot rely on hard-coded addresses or even ports, which makes network security more challenging as well. Additionally, dynamic changes to the networking layout of your Kubernetes deployments aren’t persisted since containers are immutable, and can migrate across environments over time. Finally, for large-scale deployments in particular, you will likely have many more addresses to assign and manage than with traditional deployments. Many enterprises run up against the IPv4 limitations for hosts within a subnet, resulting in the deployment of more subnets, or a move to IPv6.

Kubernetes Network Security

Nearly every technology-related list of challenges, regardless of the topic, begins with security (or at least it should). Network security has long been an area of focus, and with Kubernetes it remains so. In many ways, the ease of deployment of container-based applications can make it easy to overlook network security, or deploy applications that have flaws in the design of their Kubernetes managed containers. Fortunately, you can define Kubernetes network policies to help here, but these introduce some of their own challenges (discussed below). However, there are managed solutions, platforms, and cloud-native services available to help with Kubernetes deployment.

Networking Misconceptions

Containers and Kubernetes power today’s cloud-native application deployments so well, it makes it easy to go from development to production with little friction. This helps power an efficient DevOps practice, getting features to users quickly. However, due to misconceptions or just lack of understanding of key areas such as networking, with Kubernetes you risk deploying without consideration to proper network design. Sometimes this may be due to convenience, where developers focus on open communication in order to get their job done. Areas of consideration often overlooked involve defining specific Kubernetes network constraints around container-to-container communication, pod-to-pod networking, service communication, and Internet connectivity.

Defining and Changing Network Policies

Kubernetes network policies define how groups of pods communicate with one another. Defining and then changing the network policy for a pod or container requires you to create NetworkPolicy resources, which amounts to configuration files. For large numbers of containers and pods, this can be a daunting task. Further, changing all of these policies can be tedious and error prone. Apstra AOS is an intent-based networking system that creates a software-defined layer to allow you to easily change the policy based on application requirements, all via a REST API. It also helps to abstract network vendor specific implementations and APIs.

Project Calico is designed to help manage security with Kubernetes networking, while delivering high performance. Its usage is growing in the enterprise due to its layer-3 based scalability, its fine grained segmentation when it comes to role-based network security and policies, and its integration with the Linux kernel for maximum performance.

Lack of Abstraction

According to Network Computing, one of the challenges holding back enterprises from adopting Kubernetes as quickly as they should, is its direct effect on data center design and implementation. For example, one strategy to adopt Kubernetes involves restructuring an existing data center (or changing the deployment plans for new ones) to suit the Kubernetes and container-based application architectures specifically, or buying into other interconnectivity solutions altogether.

As an alternative, consider using a software-defined framework to turn your data center into something similar to a cloud-native environment to support both container and non-container based applications. These solutions help you define load-balancing, security policies, container traffic management, and other network challenges for your Kubernetes deployment through REST APIs. It also helps make your Kubernetes-based applications more portable across data centers and cloud providers, providing an automation layer to integrate with your DevOps practice.

As the usage of Linux containers grows rapidly, enterprise networking is still not well defined. Because of this, container applications, runtimes and orchestrators each attempt to address this problem. The goal of the Container Network Interface (CNI) specification is to address this and avoid and define a common interface between network plugins and containers themselves. The CNI describes how to write plugins to configure network interfaces for Linux containers. CNI focuses only on network connectivity of containers, removing allocated resources when the container is deleted. As a result, CNI is straightforward to implement, and has a wide range of support. In fact, CNI is supported by Project Calico, discussed above.

Battling Complexity

Complexity is one area that containers and Kubernetes were created to help eliminate. However, container-based networking complexity can still be an issue, especially at scale.
Tools such as linkerd offer Kubernetes service mesh solutions for container-based and cloud native applications, with agents that provide features such as Kubernetes service discovery, routing, failure handling, and visibility to applications transparently, without requiring code changes. A service mesh is a newer technology that helps network routing and resiliency between services, security, and observability, inspired by microservice-driven application architecture.

Network Communication Reliability

The advantages of microservice and container-based application design are well known. It simplifies development, testing, management of requirements and development team responsibilities, and eases deployment headaches. However, service-to-service communication becomes more critical in this type of architecture. As the number of services (and containers and Kubernetes pods) increases, the complexity of service communication increases, and so does the importance of reliable communications. You need to harden the Kubernetes network policies (and other configurations) to ensure reliable communications, and once you’ve done that, you need to properly monitor and manage it as well. This last point alone is vital to ensuring optimal service communication performance and reliability.

Combining Virtual Machine Networking

Even after you rise to the challenges of Kubernetes networking, you’re often faced with the need to combine the management and monitoring of VM-based deployments with container and Kubernetes-based deployments. The industry is just now beginning to offer solutions to help manage both types of deployments, together. But there are enough differences between the VMs and containers that it’s not quite transparent yet.

KubeVirt, however, offers an API to address the needs of development teams that have adopted Kubernetes for containers that also run with a mix of virtual machines. It provides a unified development platform where developers can build and deploy applications in both Application Containers and Virtual Machines in a common, shared environment. KubeVirt helps organizations with existing virtual machine-based workloads to rapidly containerize applications.

Debugging Network Connectivity Issues

When I judge development talent, I use what I call the “golf rule”. Every golfer wants to hit the ball straight and far every time, but the golfers who know how to get out of trouble (sand traps, the rough, long putts) are the most successful. This applies to technology as well: being able to debug issues effectively defines success. The same goes for Kubernetes network issues, especially across managed containers and containers deployed on geographically disparate clouds. Often, having the right tools can help, and leveraging the right Kubernetes monitoring and management platforms will make you a more effective problem solver.

Conclusion: Avoid Network Hell

Containers come and go, they’re immutable, and you just can’t nail them down, especially when you’re running Kubernetes. The networking challenges with Kubernetes at scale can seem overwhelming, but they’re manageable with the right planning and tools. From service meshes, new network infrastructure products from Cisco, frameworks and APIs that help drive a software-defined approach, and forthcoming support from heavyweights such as VMware, the future is bright for Kubernetes.

Further Readings


You may also enjoy

Kubernetes FinOps: Right-sizing Kubernetes workloads

By Joe Thompson

Kubernetes FinOps: Resource management challenges

By Joe Thompson

The browser you are using is outdated. For the best experience please download or update your browser to one of the following:

State of Kubernetes FinOps Survey – Win 13 prizes including a MacBook Air.Start Now