BIRD is not ready: Error executing command: read unix @->/var/run/calico/bird.ctl: i/o timeout
Problem
- Pod networking is misbehaving.
- A description of the
calico-node
pod(s) show thatcalico/node is not ready: BIRD is not ready
.
x
I1205 19:21:04.404697 18222 prober.go:117] Readiness probe for "calico-node-pwbdk_kube-system(efbd1219-5082-4075-8457-d5dcf11420ee):calico-node" failed (failure): calico/node is not ready: BIRD is not ready: Error executing command: read unix @->/var/run/calico/bird.ctl: i/o timeout
I1205 19:21:42.416027 18222 prober.go:117] Readiness probe for "calico-node-pwbdk_kube-system(efbd1219-5082-4075-8457-d5dcf11420ee):calico-node" failed (failure): calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: resource temporarily unavailable
I1205 19:22:28.134765 18222 prober.go:117] Liveness probe for "calico-node-pwbdk_kube-system(efbd1219-5082-4075-8457-d5dcf11420ee):calico-node" failed (failure): calico/node is not ready: Felix is not live: liveness probe reporting 503
Environment
- Platform9 Managed Kubernetes - v5.3 and Higher
- Calico- v3.18
- IPVS
- Felix
Cause
BIRD is consuming an excessive amount of CPU which can lead to timeouts (see: https://github.com/projectcalico/bird/issues/95).
Workaround
- List the
calico-node
pods within thekube-system
namespace.
kubectl get pods -n kube-system -l k8s-app=calico-node -o wide
$ kubectl get pods -n kube-system -l k8s-app=calico-node -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-node-6l5cd 0/1 Running 6 13d 10.128.146.142 master3 <none> <none>
calico-node-9zzkn 1/1 Running 1 13d 10.128.147.139 worker1 <none> <none>
calico-node-dbrd2 0/1 Running 5 13d 10.128.147.193 master1 <none> <none>
calico-node-hx9rp 0/1 Running 7 13d 10.128.146.62 master2 <none> <none>
calico-node-zbn6h 1/1 Running 0 13d 10.128.147.195 worker2 <none> <none>
- Identify which pod(s) are affected.
- Retrieve the Calico BIRD configuration file from the
calico-node
pod (corresponding to the node which is exhibiting the problem symptom).
kubectl exec -i -n kube-system __POD__ -c calico-node -- cp -v /etc/calico/confd/config/bird.cfg /var/run/calico/bird.cfg
- Edit the configuration file locally and modify the
scan time
fields fromscan time 2;
toscan time 10;
sed 's/scan time 2\;/scan time 10\;/g' /var/run/calico/bird.cfg
- Copy the updated BIRD configuration back into the
calico-node
pod.
kubectl exec -i -n kube-system __POD__ -c calico-node -- cp -v /var/run/calico/bird.cfg /etc/calico/confd/config/bird.cfg
- Reload the BIRD configuration (from within the
calico-node pod
).
kubectl exec -i -n kube-system calico-node-6l5cd -c calico-node -- birdcl configure
# birdcl configure
BIRD v0.3.3+birdv1.6.8 ready.
Reading configuration from /etc/calico/confd/config/bird.cfg
Reconfigured
Note: The above steps are a temporary workaround and will not be persisted beyond the lifetime of the pod (i.e. the updated configuration will be lost if the pod is killed).
Resolution
A fix has been included in the LTS3 release, which has Calico v3.24 and K8s v1.25. This issue was tracked as part of the jira AIR-1104.
Was this page helpful?