The Six Most Popular Kubernetes Networking Troubleshooting Issues
Kubernetes follows certain rules and policies when it comes to networking, and it’s not uncommon to encounter issues when trying to connect applications running in Kubernetes. Even the most trivial deployment needs to have the correct configuration so that K8s can assign the right IP address or ingress controller to the service. Furthermore, if you are operating the cluster on a public cloud provider like Google Cloud or AWS, you may have to follow their recommended configurations when deploying custom Kubernetes networking tools and certificate managers.
From an operator’s point of view, your job is to choose the right CNI (Flannel, Calico, or Weave), install a certificate manager (cert-manager), and route a domain to the cluster so that everything will work efficiently.
If you are a developer, on the other hand, you are probably more worried about ingress, paths, and certificates. How would you figure out why your application is not being routed successfully in a specific cluster?
In this extended tutorial, we will introduce you to the six most popular Kubernetes networking troubleshooting issues and show you how to solve them.
How to Debug DNS Resolution in K8s
Cannot Access Application from the Outside Using Ingress NGINX Controller
Flannel vs. Calico vs. Weave – Which One Is Better?
How Can I Redirect HTTP to HTTPs Using K8s Ingress?
Cert-Manager: How to See if the Client TLS Certificate Was Renewed
HELP! My Worker Node Is Not Ready and Returns the “CNI Plugin Not Initialized” Error
1. How to Debug DNS Resolution in Kubernetes networking
If you have trouble resolving DNS in K8s (when issuing certificates, for example), you might want to start with debugging the DNS resolution flow within the cluster. Here is what you can do:
Make sure that the dns-server is up and running:
$ kubectl get svc kube-dns --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.36.0.10 <none> 53/UDP,53/TCP 3m51s
Inspect the logs in the service to check for trouble signals:
$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
Query the kube-dns endpoints as they should be exposed:
$ kubectl get endpoints kube-dns --namespace=kube-system -o wide
NAME ENDPOINTS AGE
kube-dns 10.32.0.3:53,10.32.0.3:53 14m
If it’s running, spin up a few containers with basic network tools like dnsutils:
$ kubectl run -i -t dnsbox --image=tutum/dnsutils --restart=Never
Start an interactive terminal within the dnsbox container and check the /etc/resolv.conf to ensure that it can resolve the kube-dns service:
$ cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local europe-west2-a.c.stoikman-198318.internal c.stoikman-198318.internal google.internal
nameserver 10.36.0.10
options ndots:5
Use the nslookup tool to search for the kubernetes.default nameserver:
$ kubectl exec -i -t dnsbox -- nslookup kubernetes.default
Server: 10.36.0.10
Address: 10.36.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.36.0.1
Check to see if you can ping public dns servers like 1.1.1.1 or 8.8.8.8:
$ ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=56 time=3.10 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=56 time=3.44 ms
$ dig google.com
; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27765
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com. IN A
;; ANSWER SECTION:
google.com. 300 IN A 142.250.187.206
Depending on the output of each command, you may have to perform follow-up searches to see what the root cause of your problem is.
2. Cannot Access Application from the Outside Using Ingress NGINX Controller
Incorrect ingress configurations are often the main source of problems related to the failure to establish the right routes or connectivity for the applications.
For example, you need to make sure that you assign the correct class into YAML config in the annotations section:
With cluster version < 1.19
annotations:
kubernetes.io/ingress.class: "nginx"
With cluster version >= 1.19
spec:
ingressClassName: nginx
If you don’t apply those annotations, then K8s will not know which ingress to associate to the ingress controller because you may be running multiple ingress classes within the same cluster.
Another source of problems comes from overriding the default backend annotation:
nginx.ingress.kubernetes.io/default-backend: example.com
This is supposed to be a backend that handles requests that the ingress controller does not understand, so it shouldn’t match a valid backend. Instead, it should match a service that handles 404 requests. For example, the following ingress might create issues:
metadata:
name: test-ingress
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/default-backend: example
spec:
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: example
port:
number: 80
Most of the time, you should just keep the value for the default-backend as it is rather than providing a new one.
3. Flannel vs. Calico vs. Weave – Which One Is Better at Kubernetes networking?
CNI plugins like Flannel, Calico, and Weave are designed to provide an unambiguous and painless way to configure container networking using a common interface. Using these plugins, you can rest assured that the minimum Kubernetes networking requirements are satisfied so that K8s can run efficiently. We will explain each provider below.
Flannel
Flannel is focused on networking at Layer 3 in the OSI networking model. It is considered to be a simple configuration tool for basic requirements. It runs a simple overlay network across all nodes of the Kubernetes cluster. For more advanced requirements (like the ability to configure Kubernetes networking policies and firewalls), we recommend that you use a more completable-future friendly plugin like Calico.
To get started with Flannel, you can install the required services and daemonsets by applying the following manifest in a test cluster:
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
For more details about how Flannel works behind the scenes, you can read this guide.
One common issue with Flannel is that pods sometimes fail to communicate with other pods in the same cluster, especially after restarting nodes or when upgrading the cluster. If you have this issue, you may need to upgrade Flannel to the latest version as follows:
- Delete the Flannel daemonset:
$ kubectl delete daemonset kube-flannel-ds --namespace=kube-system
- Upgrade Flannel to the latest version:
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Calico
Calico is a full-featured CNI plugin that is maintained by Tigera. It’s currently very well maintained and has wide community support. If you choose to migrate to Calico from Flannel, you may find that the integration process is smooth. Calico utilizes the BGP protocol to move network packets between nodes.
To get started with Calico, you can install the required services and daemonsets by applying the following manifest in a self-managed Kubernetes cluster:
$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
This will deploy several resources in the cluster. You can use the calicoctl cli to inspect the node status:
$ calicoctl node status
And you can inspect the status of the calico-kube-controllers pod like this:
$ kubectl get deployment.apps/calico-kube-controllers -n kube-system -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
calico-kube-controllers 1/1 1 1 10m calico-kube-controllers docker.io/calico/kube-controllers:v3.21.1 k8s-app=calico-kube-controllers
In some cases, you may encounter issues when disabling or removing a kubernetes networking policy with Calico. If you have problems that you can’t seem to fix, you can follow these steps in the official documentation to troubleshoot and clean up any Calico-related iptables rules.
Weave
Weave is a full-featured CNI plugin maintained by Weaveworks, which means that it allows you to create network policies (unlike Flannel). It uses a mesh overlay model between all nodes of a K8s cluster and employs a combination of strategies for routing packets between containers on different hosts.
To get started with Weave, you can install the required services and daemonsets by applying the following manifest in a test cluster:
$ kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
Next, check to see if it’s running:
$ kubectl get pods -n kube-system -l name=weave-net
NAME READY STATUS RESTARTS AGE
weave-net-7pnz9 1/2 CrashLoopBackOff 6 9m19s
weave-net-w7xnq 1/2 CrashLoopBackOff 6 9m19s
If you see a problem like the one above, you can inspect the logs to find the source:
$ kubectl logs -n kube-system weave-net-7pnz9 weave
Network 10.32.0.0/12 overlaps with existing route 10.32.1.10/32 on host
In this case, it looks like the default Kubernetes network that Weave defined in IPALLOC_RANGE overlaps with one that already exists in the host. You can try again with a different configuration address range:
$ kubectl delete -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
$ kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=10.0.0.0/16"
$ kubectl get pods -n kube-system -l name=weave-net
NAME READY STATUS RESTARTS AGE
weave-net-fhcgx 2/2 Running 1 5s
weave-net-v9w6l 2/2 Running 1 5s
Then, verify that the Weave pods are running:
$ kubectl get pods -n kube-system -l name=weave-net -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
weave-net-fhcgx 2/2 Running 1 5m12s 10.154.0.6 gke-hello-cluster-default-pool-3f24c17e-c80q <none> <none>
weave-net-v9w6l 2/2 Running 1 5m12s 10.154.0.5 gke-hello-cluster-default-pool-3f24c17e-v21p <none> <none>
Weave deploys one of these pods per node in order to interconnect all hosts. You can use the pods to run status commands against each other like this:
$ kubectl exec -n kube-system weave-net-fhcgx -c weave -- /home/weave/weave --local status
If you’ve used Calico or Flannel and aren’t satisfied with their features or the experience they provide, Weave is a good alternative.
For an even more detailed comparison, you can read this article.
4. How Can I Redirect HTTP to HTTPs Using K8s Ingress?
If your K8s ingress operator does not support HTTP to HTTPS redirects out-of-the-box, you might have to configure it to do so within the appropriate metadata.
For example, you might need to set up a redirect middleware using Traefik, as follows:
# Redirect to https
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
name: example-redirectscheme
spec:
redirectScheme:
scheme: https
permanent: true
Then, add this to an ingress metadata annotation list:
# HTTPS ingress
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
annotations:
traefik.ingress.kubernetes.io/router.middlewares: default-example-redirectscheme@kubernetescrd
…
By default, you won’t need to specify an annotation with ingress-nginx, as it will redirect to TLS if you have a valid TLS-enabled backend. If you don’t, you can use the following annotation to enforce it:
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
You might want to check to see if you have the following config on your NGINX sites config just to be sure:
if ($scheme = http) {
return 301 https://$server_name$request_uri;
}
5. Cert-Manager: How to See if the Client TLS Certificate Was Renewed
When a certificate is renewed, some fields (like the revision date) will be updated. You can check the revision count of a certificate using the following command:
$ kubectl get cert test-cert -o yaml
...
revision: 5
You also want to check certificate requests for a list of certificate requests issued by cert-manager and see if the AGE column was updated recently:
$ kubectl get certificaterequests
NAME APPROVED DENIED READY ISSUER REQUESTOR
example-cert-11xrr True True ca-issuer system:serviceaccount:cert-manager:cert-manager 4m
Finally, check the events log to see if you can find the related renewal message:
$ kubectl get events
110s Normal Issuing certificate/test-cert The certificate has been successfully issued
6. HELP! My Worker Node Is Not Ready and Returns the “CNI Plugin Not Initialized” Error
The most common configuration issues with CNI plugins are related to setting the correct pod-network-cidr parameter or failing to match the CNI plugin configuration (IPALLOC_RANGE in Weave or this config in Flannel). For reference, you can use the following command when setting up the cluster:
$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16
This cidr block must be available to use within your network, and it should fall within the –cluster-cidr block if defined.
You can inspect the current value by using cluster-info and filtering the specified config:
...
--cluster-cidr=10.32.0.0/14
If the problem persists, you might need to consult your CNI plugin’s documentation. Make sure that you are following the correct deployment instructions. For example, you might need to remove and reinstall certain kube-system pods so that they will pick up the correct config.
You can also search for solutions if you are dealing with a known issue with your cloud provider. If you are using AKS, for example, you can search here for known issues and filter by your CNI plugin name.
Expand Your Knowledge of Kubernetes Networking
Learning about K8s networking is a continuous process. You will need to spend considerable time evaluating the available options and troubleshooting issues in order to be productive.
To start, you should carefully read the official docs and keep them handy for reference.
You can expand your knowledge by reading Platform9’s Kubernetes networking blog series and exploring the “Further Readings” sections. By the end of this series, you will have a strong foundation for tackling challenges related to managing Kubernetes networking.
I also recommend that you read this three-part series on K8s networking fundamentals written by Mark Betz, since it’s considered to be one of the best introductions to the topic.
- Beyond Kubernetes Operations: Discover Platform9’s Always-On Assurance™ - November 29, 2023
- KubeCon 2023 Through Platform9’s Lens: Key Takeaways and Innovative Demos - November 14, 2023
- Getting to know Nate Conger: A candid conversation - June 12, 2023