BIRD is not ready: Error executing command: read unix @->/var/run/calico/bird.ctl: i/o timeout

Problem

Pod networking is misbehaving.
A description of the calico-node pod(s) show that calico/node is not ready: BIRD is not ready .

Bash
    
​x
    
I1205 19:21:04.404697   18222 prober.go:117] Readiness probe for "calico-node-pwbdk_kube-system(efbd1219-5082-4075-8457-d5dcf11420ee):calico-node" failed (failure): calico/node is not ready: BIRD is not ready: Error executing command: read unix @->/var/run/calico/bird.ctl: i/o timeout​I1205 19:21:42.416027   18222 prober.go:117] Readiness probe for "calico-node-pwbdk_kube-system(efbd1219-5082-4075-8457-d5dcf11420ee):calico-node" failed (failure): calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: resource temporarily unavailable​I1205 19:22:28.134765   18222 prober.go:117] Liveness probe for "calico-node-pwbdk_kube-system(efbd1219-5082-4075-8457-d5dcf11420ee):calico-node" failed (failure): calico/node is not ready: Felix is not live: liveness probe reporting 503
Copy

Environment

Platform9 Managed Kubernetes - v5.3 and Higher
Calico- v3.18
IPVS
Felix

Cause

BIRD is consuming an excessive amount of CPU which can lead to timeouts (see: https://github.com/projectcalico/bird/issues/95).

Workaround

List the calico-node pods within the kube-system namespace.

Bash
    
 
kubectl get pods -n kube-system -l k8s-app=calico-node -o wide
Copy

Bash
    
 
$ kubectl get pods -n kube-system -l k8s-app=calico-node -o wideNAME                READY   STATUS    RESTARTS   AGE   IP               NODE          NOMINATED NODE   READINESS GATEScalico-node-6l5cd   0/1     Running   6          13d   10.128.146.142   master3   <none>           <none>calico-node-9zzkn   1/1     Running   1          13d   10.128.147.139   worker1   <none>           <none>calico-node-dbrd2   0/1     Running   5          13d   10.128.147.193   master1   <none>           <none>calico-node-hx9rp   0/1     Running   7          13d   10.128.146.62    master2   <none>           <none>calico-node-zbn6h   1/1     Running   0          13d   10.128.147.195   worker2   <none>           <none>
Copy

Identify which pod(s) are affected.
Retrieve the Calico BIRD configuration file from the calico-node pod (corresponding to the node which is exhibiting the problem symptom).

Bash
    
kubectl exec -i -n kube-system __POD__ -c calico-node -- cp -v /etc/calico/confd/config/bird.cfg /var/run/calico/bird.cfg
Copy

Edit the configuration file locally and modify the scan time fields from scan time 2; to scan time 10;

Bash
    
 
sed 's/scan time 2\;/scan time 10\;/g' /var/run/calico/bird.cfg
Copy

Copy the updated BIRD configuration back into the calico-node pod.

Bash
    
kubectl exec -i -n kube-system __POD__ -c calico-node -- cp -v /var/run/calico/bird.cfg /etc/calico/confd/config/bird.cfg
Copy

Reload the BIRD configuration (from within the calico-node pod).

Bash
    
 
kubectl exec -i -n kube-system calico-node-6l5cd -c calico-node -- birdcl configure
Copy

Bash
    
 
# birdcl configureBIRD v0.3.3+birdv1.6.8 ready.Reading configuration from /etc/calico/confd/config/bird.cfgReconfigured
Copy

Note: The above steps are a temporary workaround and will not be persisted beyond the lifetime of the pod (i.e. the updated configuration will be lost if the pod is killed).

Resolution

A fix has been included in the LTS3 release, which has Calico v3.24 and K8s v1.25. This issue was tracked as part of the jira AIR-1104.

Last updated on

Was this page helpful?