# Pod "etcd-backup-with-interval-" in "NotReady" State

## Problem

One or more `etcd-backup-with-interval-` pods in the `kube-system` namespace are in a `NotReady` state, e.g.

{% tabs %}
{% tab title="Bash" %}

```bash
$ kubectl get pods -n kube-system | grep etcd
etcd-backup-with-interval-28050660--1-kd7pg 1/2 NotReady 0 17m
```

{% endtab %}
{% endtabs %}

The pod is associated with a job which is showing 0/1 completions, e.g.

{% tabs %}
{% tab title="Bash" %}

```bash
$ kubectl get jobs -n kube-system | grep 28050660
etcd-backup-with-interval-28050660   0/1           245d       245d
```

{% endtab %}
{% endtabs %}

The pod log shows only that it has created a temporary DB file with no further output, e.g.

{% tabs %}
{% tab title="Bash" %}

```bash
{"level":"info","ts":1704221964.2963,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/backup/etcd-snapshot-2024-01-02_18:59:24_UTC.db.part"}%
```

{% endtab %}
{% endtabs %}

A `kubectl describe job` shows that the `State` is `Running` .

{% tabs %}
{% tab title="Bash" %}

```bash
etcd-backup:
...
    Image:         gcr.io/etcd-development/etcd:v3.4.14
...
    Command:
      /bin/sh
    Args:
      -c
      etcdctl snapshot save /backup/etcd-snapshot-$(date +%Y-%m-%d_%H:%M:%S_%Z).db
    State:          Running
      Started:      Tue, 02 Jan 2024 12:59:24 -0600
    Ready:          True
    Restart Count:  0
```

{% endtab %}
{% endtabs %}

## Environment

* Platform9 Managed Kubernetes – v5.7 and Higher

## Cause

The `etcdctl snapshot save` command is "hanging" or failing to complete as it is missing the following `Environment` section/variables which control the flags to be passed to the `etcdctl` command-line utility which are necessary for TLS authentication.

{% tabs %}
{% tab title="Bash" %}

```bash
Environment:
      ETCDCTL_API:     3
      ETCDCTL_CACERT:  /certs/apiserver/etcd/ca.crt
      ETCDCTL_CERT:    /certs/apiserver/etcd/request.crt
      ETCDCTL_KEY:     /certs/apiserver/etcd/request.key
```

{% endtab %}
{% endtabs %}

Thus, the `job` never transitions to `Succeeded` or `Failed` and the pod will continue to be re-created.

## Resolution

1. Delete the `job` which is associated with the `pod` .
2. The `pod` will be terminated, and a new pod will be re-created associated with a new `job` resource which is using an updated spec template.
