Pod "etcd-backup-with-interval-" in "NotReady" State
Problem
One or more etcd-backup-with-interval-
pods in the kube-system
namespace are in a NotReady
state, e.g.
$ kubectl get pods -n kube-system | grep etcd
etcd-backup-with-interval-28050660--1-kd7pg 1/2 NotReady 0 17m
The pod is associated with a job which is showing 0/1 completions, e.g.
$ kubectl get jobs -n kube-system | grep 28050660
etcd-backup-with-interval-28050660 0/1 245d 245d
The pod log shows only that it has created a temporary DB file with no further output, e.g.
{"level":"info","ts":1704221964.2963,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/backup/etcd-snapshot-2024-01-02_18:59:24_UTC.db.part"}%
A kubectl describe job
shows that the State
is Running
.
etcd-backup:
...
Image: gcr.io/etcd-development/etcd:v3.4.14
...
Command:
/bin/sh
Args:
-c
etcdctl snapshot save /backup/etcd-snapshot-$(date +%Y-%m-%d_%H:%M:%S_%Z).db
State: Running
Started: Tue, 02 Jan 2024 12:59:24 -0600
Ready: True
Restart Count: 0
Environment
- Platform9 Managed Kubernetes – v5.7 and Higher
Cause
The etcdctl snapshot save
command is "hanging" or failing to complete as it is missing the following Environment
section/variables which control the flags to be passed to the etcdctl
command-line utility which are necessary for TLS authentication.
Environment:
ETCDCTL_API: 3
ETCDCTL_CACERT: /certs/apiserver/etcd/ca.crt
ETCDCTL_CERT: /certs/apiserver/etcd/request.crt
ETCDCTL_KEY: /certs/apiserver/etcd/request.key
Thus, the job
never transitions to Succeeded
or Failed
and the pod will continue to be re-created.
Resolution
- Delete the
job
which is associated with thepod
. - The
pod
will be terminated, and a new pod will be re-created associated with a newjob
resource which is using an updated spec template.
Was this page helpful?