# ETCD Backup Cronjob Fails and job pods Report the Status as 'NotReady'

## Problem

* `The etcd-backup-with-interval` cronjob in the `kube-system` namespace fails and the job pod created during the cron execution reports the status as `'NotReady'`
* ETCD debug logs show the below message:

`transport: loopyWriter.run returning. connection error: desc = "transport is closing"`

## Environment

* Platform9 Managed Kubernetes - v-5.6.8 and Higher.

## Cause

* ETCD uses gRPC calls and the error message means that the connection which the RPC was using, was closed.
* This can happen due to any of the below reasons:
  1. Mis-configured transport credentials, connection failed on handshaking.
  2. Bytes disrupted, possibly by a proxy in between.
  3. Server shutdown.
  4. Keepalive parameters caused connection shutdown, for example if you have configured your server to terminate connections regularly to trigger DNS lookups. If this is the case, you may want to increase your MaxConnectionAgeGrace, to allow longer RPC calls to finish.
  5. ETCD Leader Elections can also cause transient fails.
  6. ETCD took too long to process this request and eventually it hit a timeout.

## Resolution

* List the `jobs` (not cronjobs) in the `kube-system` namespace

{% tabs %}
{% tab title="List the jobs" %}

```bash
$ kubectl get jobs -n kube-system
```

{% endtab %}
{% endtabs %}

* Delete all the jobs that are reporting the status as `"0/1"` but are **not** `Completed`

```bash
$ kubectl delete job <job-name> -n kube-system
```

## Additional Information

* Currently, Catapult monitoring does trigger an alert if the job fails but does not trigger an alert in this case as the job is never failed but is running and failing.
* There is an existing bug reported internally for Catapult monitoring to send this alert as well - **PMK-6340.**


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://platform9.com/kb/pmk/solutions/etcd-backup-cronjob-fails.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
