The Six Most Popular Kubernetes Networking Troubleshooting Issues

Kubernetes follows certain rules and policies when it comes to networking, and it’s not uncommon to encounter issues when trying to connect applications running in Kubernetes. Even the most trivial deployment needs to have the correct configuration so that K8s can assign the right IP address or ingress controller to the service. Furthermore, if you are operating the cluster on a public cloud provider like Google Cloud or AWS, you may have to follow their recommended configurations when deploying custom Kubernetes networking tools and certificate managers.

From an operator’s point of view, your job is to choose the right CNI (Flannel, Calico, or Weave), install a certificate manager (cert-manager), and route a domain to the cluster so that everything will work efficiently.

If you are a developer, on the other hand, you are probably more worried about ingress, paths, and certificates. How would you figure out why your application is not being routed successfully in a specific cluster?

In this extended tutorial, we will introduce you to the six most popular Kubernetes networking troubleshooting issues and show you how to solve them.

How to Debug DNS Resolution in K8s

Cannot Access Application from the Outside Using Ingress NGINX Controller

Flannel vs. Calico vs. Weave – Which One Is Better?

How Can I Redirect HTTP to HTTPs Using K8s Ingress?

Cert-Manager: How to See if the Client TLS Certificate Was Renewed

HELP! My Worker Node Is Not Ready and Returns the “CNI Plugin Not Initialized” Error

1. How to Debug DNS Resolution in Kubernetes networking

If you have trouble resolving DNS in K8s (when issuing certificates, for example), you might want to start with debugging the DNS resolution flow within the cluster. Here is what you can do:

Make sure that the dns-server is up and running:

$ kubectl get svc kube-dns --namespace=kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
kube-dns   ClusterIP   10.36.0.10   <none>        53/UDP,53/TCP   3m51s

Inspect the logs in the service to check for trouble signals:

$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns

Query the kube-dns endpoints as they should be exposed:

$ kubectl get endpoints kube-dns --namespace=kube-system -o wide
NAME       ENDPOINTS                   AGE
kube-dns   10.32.0.3:53,10.32.0.3:53   14m

If it’s running, spin up a few containers with basic network tools like dnsutils:

$ kubectl run -i -t dnsbox --image=tutum/dnsutils --restart=Never

Start an interactive terminal within the dnsbox container and check the /etc/resolv.conf to ensure that it can resolve the kube-dns service:

$ cat /etc/resolv.conf 
search default.svc.cluster.local svc.cluster.local cluster.local europe-west2-a.c.stoikman-198318.internal c.stoikman-198318.internal google.internal
nameserver 10.36.0.10
options ndots:5

Use the nslookup tool to search for the kubernetes.default nameserver:

$ kubectl exec -i -t dnsbox -- nslookup kubernetes.default
Server: 10.36.0.10
Address: 10.36.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.36.0.1

Check to see if you can ping public dns servers like 1.1.1.1 or 8.8.8.8:

$ ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=56 time=3.10 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=56 time=3.44 ms

$ dig google.com
; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27765
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com. IN A

;; ANSWER SECTION:
google.com. 300 IN A 142.250.187.206

Depending on the output of each command, you may have to perform follow-up searches to see what the root cause of your problem is.

2. Cannot Access Application from the Outside Using Ingress NGINX Controller

Incorrect ingress configurations are often the main source of problems related to the failure to establish the right routes or connectivity for the applications.

For example, you need to make sure that you assign the correct class into YAML config in the annotations section:

With cluster version < 1.19

annotations:
  kubernetes.io/ingress.class: "nginx"

With cluster version >= 1.19

spec:
  ingressClassName: nginx

If you don’t apply those annotations, then K8s will not know which ingress to associate to the ingress controller because you may be running multiple ingress classes within the same cluster.

Another source of problems comes from overriding the default backend annotation:

nginx.ingress.kubernetes.io/default-backend: example.com

This is supposed to be a backend that handles requests that the ingress controller does not understand, so it shouldn’t match a valid backend. Instead, it should match a service that handles 404 requests. For example, the following ingress might create issues:

metadata:
  name: test-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/default-backend: example
spec:
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: example
                port:
                  number: 80

Most of the time, you should just keep the value for the default-backend as it is rather than providing a new one.

3. Flannel vs. Calico vs. Weave – Which One Is Better at Kubernetes networking?

CNI plugins like Flannel, Calico, and Weave are designed to provide an unambiguous and painless way to configure container networking using a common interface. Using these plugins, you can rest assured that the minimum Kubernetes networking requirements are satisfied so that K8s can run efficiently. We will explain each provider below.

Flannel

Flannel is focused on networking at Layer 3 in the OSI networking model. It is considered to be a simple configuration tool for basic requirements. It runs a simple overlay network across all nodes of the Kubernetes cluster. For more advanced requirements (like the ability to configure Kubernetes networking policies and firewalls), we recommend that you use a more completable-future friendly plugin like Calico.

To get started with Flannel, you can install the required services and daemonsets by applying the following manifest in a test cluster:

$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

For more details about how Flannel works behind the scenes, you can read this guide.

One common issue with Flannel is that pods sometimes fail to communicate with other pods in the same cluster, especially after restarting nodes or when upgrading the cluster. If you have this issue, you may need to upgrade Flannel to the latest version as follows:

Delete the Flannel daemonset:

$ kubectl delete daemonset kube-flannel-ds --namespace=kube-system

Upgrade Flannel to the latest version:

$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Calico

Calico is a full-featured CNI plugin that is maintained by Tigera. It’s currently very well maintained and has wide community support. If you choose to migrate to Calico from Flannel, you may find that the integration process is smooth. Calico utilizes the BGP protocol to move network packets between nodes.

To get started with Calico, you can install the required services and daemonsets by applying the following manifest in a self-managed Kubernetes cluster:

$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

This will deploy several resources in the cluster. You can use the calicoctl cli to inspect the node status:

$ calicoctl node status

And you can inspect the status of the calico-kube-controllers pod like this:

$ kubectl get  deployment.apps/calico-kube-controllers -n kube-system -o wide
NAME                      READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS                IMAGES                                      SELECTOR
calico-kube-controllers   1/1     1            1           10m   calico-kube-controllers   docker.io/calico/kube-controllers:v3.21.1   k8s-app=calico-kube-controllers

In some cases, you may encounter issues when disabling or removing a kubernetes networking policy with Calico. If you have problems that you can’t seem to fix, you can follow these steps in the official documentation to troubleshoot and clean up any Calico-related iptables rules.

Weave

Weave is a full-featured CNI plugin maintained by Weaveworks, which means that it allows you to create network policies (unlike Flannel). It uses a mesh overlay model between all nodes of a K8s cluster and employs a combination of strategies for routing packets between containers on different hosts.

To get started with Weave, you can install the required services and daemonsets by applying the following manifest in a test cluster:

$ kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

Next, check to see if it’s running:

$ kubectl get pods -n kube-system -l name=weave-net
NAME              READY   STATUS             RESTARTS   AGE
weave-net-7pnz9   1/2     CrashLoopBackOff   6          9m19s
weave-net-w7xnq   1/2     CrashLoopBackOff   6          9m19s

If you see a problem like the one above, you can inspect the logs to find the source:

$ kubectl logs -n kube-system weave-net-7pnz9 weave

Network 10.32.0.0/12 overlaps with existing route 10.32.1.10/32 on host

In this case, it looks like the default Kubernetes network that Weave defined in IPALLOC_RANGE overlaps with one that already exists in the host. You can try again with a different configuration address range:

$ kubectl delete -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

$ kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=10.0.0.0/16"

$ kubectl get pods -n kube-system -l name=weave-net
NAME              READY   STATUS    RESTARTS   AGE
weave-net-fhcgx   2/2     Running   1          5s
weave-net-v9w6l   2/2     Running   1          5s

Then, verify that the Weave pods are running:

$ kubectl get pods -n kube-system -l name=weave-net -o wide
NAME              READY   STATUS    RESTARTS   AGE     IP           NODE                                           NOMINATED NODE   READINESS GATES
weave-net-fhcgx   2/2     Running   1          5m12s   10.154.0.6   gke-hello-cluster-default-pool-3f24c17e-c80q   <none>           <none>
weave-net-v9w6l   2/2     Running   1          5m12s   10.154.0.5   gke-hello-cluster-default-pool-3f24c17e-v21p   <none>           <none>

Weave deploys one of these pods per node in order to interconnect all hosts. You can use the pods to run status commands against each other like this:

$ kubectl exec -n kube-system weave-net-fhcgx -c weave -- /home/weave/weave --local status

If you’ve used Calico or Flannel and aren’t satisfied with their features or the experience they provide, Weave is a good alternative.

For an even more detailed comparison, you can read this article.

4. How Can I Redirect HTTP to HTTPs Using K8s Ingress?

If your K8s ingress operator does not support HTTP to HTTPS redirects out-of-the-box, you might have to configure it to do so within the appropriate metadata.

For example, you might need to set up a redirect middleware using Traefik, as follows:

# Redirect to https
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: example-redirectscheme
spec:
  redirectScheme:
    scheme: https
    permanent: true
Then, add this to an ingress metadata annotation list:
# HTTPS ingress
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  annotations:
    traefik.ingress.kubernetes.io/router.middlewares: default-example-redirectscheme@kubernetescrd
…

By default, you won’t need to specify an annotation with ingress-nginx, as it will redirect to TLS if you have a valid TLS-enabled backend. If you don’t, you can use the following annotation to enforce it:

nginx.ingress.kubernetes.io/force-ssl-redirect: "true"

You might want to check to see if you have the following config on your NGINX sites config just to be sure:

if ($scheme = http) {
   return 301 https://$server_name$request_uri;
}

5. Cert-Manager: How to See if the Client TLS Certificate Was Renewed

When a certificate is renewed, some fields (like the revision date) will be updated. You can check the revision count of a certificate using the following command:

$ kubectl get cert test-cert -o yaml
...
revision: 5

You also want to check certificate requests for a list of certificate requests issued by cert-manager and see if the AGE column was updated recently:

$ kubectl get certificaterequests
NAME                 APPROVED   DENIED   READY   ISSUER      REQUESTOR
example-cert-11xrr   True                True    ca-issuer   system:serviceaccount:cert-manager:cert-manager   4m

Finally, check the events log to see if you can find the related renewal message:

$ kubectl get events
110s        Normal   Issuing             certificate/test-cert                The certificate has been successfully issued

6. HELP! My Worker Node Is Not Ready and Returns the “CNI Plugin Not Initialized” Error

The most common configuration issues with CNI plugins are related to setting the correct pod-network-cidr parameter or failing to match the CNI plugin configuration (IPALLOC_RANGE in Weave or this config in Flannel). For reference, you can use the following command when setting up the cluster:

$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16

This cidr block must be available to use within your network, and it should fall within the –cluster-cidr block if defined.

You can inspect the current value by using cluster-info and filtering the specified config:

...
--cluster-cidr=10.32.0.0/14

If the problem persists, you might need to consult your CNI plugin’s documentation. Make sure that you are following the correct deployment instructions. For example, you might need to remove and reinstall certain kube-system pods so that they will pick up the correct config.

You can also search for solutions if you are dealing with a known issue with your cloud provider. If you are using AKS, for example, you can search here for known issues and filter by your CNI plugin name.

Expand Your Knowledge of Kubernetes Networking

Learning about K8s networking is a continuous process. You will need to spend considerable time evaluating the available options and troubleshooting issues in order to be productive.

To start, you should carefully read the official docs and keep them handy for reference.

You can expand your knowledge by reading Platform9’s Kubernetes networking blog series and exploring the “Further Readings” sections. By the end of this series, you will have a strong foundation for tackling challenges related to managing Kubernetes networking.

I also recommend that you read this three-part series on K8s networking fundamentals written by Mark Betz, since it’s considered to be one of the best introductions to the topic.

Author

Platform9

Platform9 is a leader in simplifying enterprise private clouds. Our flagship product, Private Cloud Director, turns existing infrastructure into a full-featured private cloud. Enterprise IT teams can manage VMs and containers with familiar GUI tools and automated APIs in a private, secure environment.

View all posts