Pod "etcd-backup-with-interval-" in "NotReady" State
Problem
One or more etcd-backup-with-interval- pods in the kube-system namespace are in a NotReady state, e.g.
$ kubectl get pods -n kube-system | grep etcdetcd-backup-with-interval-28050660--1-kd7pg 1/2 NotReady 0 17mThe pod is associated with a job which is showing 0/1 completions, e.g.
$ kubectl get jobs -n kube-system | grep 28050660etcd-backup-with-interval-28050660 0/1 245d 245dThe pod log shows only that it has created a temporary DB file with no further output, e.g.
{"level":"info","ts":1704221964.2963,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/backup/etcd-snapshot-2024-01-02_18:59:24_UTC.db.part"}%A kubectl describe job shows that the State is Running .
etcd-backup:... Image: gcr.io/etcd-development/etcd:v3.4.14... Command: /bin/sh Args: -c etcdctl snapshot save /backup/etcd-snapshot-$(date +%Y-%m-%d_%H:%M:%S_%Z).db State: Running Started: Tue, 02 Jan 2024 12:59:24 -0600 Ready: True Restart Count: 0Environment
- Platform9 Managed Kubernetes – v5.7 and Higher
Cause
The etcdctl snapshot save command is "hanging" or failing to complete as it is missing the following Environment section/variables which control the flags to be passed to the etcdctl command-line utility which are necessary for TLS authentication.
Environment: ETCDCTL_API: 3 ETCDCTL_CACERT: /certs/apiserver/etcd/ca.crt ETCDCTL_CERT: /certs/apiserver/etcd/request.crt ETCDCTL_KEY: /certs/apiserver/etcd/request.keyThus, the job never transitions to Succeeded or Failed and the pod will continue to be re-created.
Resolution
- Delete the
jobwhich is associated with thepod. - The
podwill be terminated, and a new pod will be re-created associated with a newjobresource which is using an updated spec template.
Was this page helpful?