# Tuning Kubelet Garbage Collection & Eviction Thresholds for Devicemapper

## Problem

Kubelet does not perform garbage collection with Docker as the underlying Container Runtime using the [Device Mapper](https://docs.docker.com/storage/storagedriver/device-mapper-driver/) storage driver.

## Environment

* Platform9 Managed Kubernetes – All Versions
* Kubelet
* Docker
* Devicemapper

## Cause

Due to a an [alleged discrepancy](https://github.com/kubernetes/kubernetes/issues/16563#issuecomment-1058696699) in the Kubernetes code, and based on observations made when querying the Kubelet [resource metrics](https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline), it appears that Kubelet does not properly record the image filesystem usage based on the DM thin-pool; rather, the disk capacity is based on the root disk.

## Resolution

### Option A (Recommended): Switch to Supported Storage Driver (Overlay2)

1. Stop the Hostagent and Nodelet daemon services on each worker node.

{% tabs %}
{% tab title="Bash" %}

```bash
systemctl stop pf9-{hostagent,nodeletd}
```

{% endtab %}
{% endtabs %}

{% hint style="warning" %}
**Warning**

The node will now show as offline in the Platform9 UI, and you may receive a host-down notification.
{% endhint %}

2. Issue a `stop` for the Nodelet phases.

{% tabs %}
{% tab title="Bash" %}

```bash
sudo /opt/pf9/nodelet/nodeletd phases stop
```

{% endtab %}
{% endtabs %}

{% hint style="warning" %}
**Warning**

All running pods will be drained and all running containers destroyed. Kubelet will no longer report its status, and the Docker daemon will be brought down also.
{% endhint %}

3. Follow Steps #2-#4 from [Configuring Docker with the overlay2 Storage Driver](https://docs.docker.com/storage/storagedriver/overlayfs-driver/#configure-docker-with-the-overlay-or-overlay2-storage-driver).
4. Start the Hostagent service.

{% tabs %}
{% tab title="Bash" %}

```bash
systemctl start pf9-hostagent
```

{% endtab %}
{% endtabs %}

Option B: Tune Kubelet Parameters for Garbage Collection & Eviction Thresholds

1. Run the `docker info` command on the worker node and identify the `Data loop file` .

{% tabs %}
{% tab title="Bash" %}

```bash
docker info | grep /var/lib
  Data loop file: /var/lib/docker/devicemapper/devicemapper/data
  Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Docker Root Dir: /var/lib/docker
WARNING: the devicemapper storage-driver is deprecated, and will be removed in a future release.
WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
         Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
```

{% endtab %}
{% endtabs %}

2. Check the size of the disk/partition on which the data loop file exists and note it down.

{% tabs %}
{% tab title="Bash" %}

```bash
df -h /
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/centos00-root  1.4T   86G  1.3T   7% /
```

{% endtab %}
{% endtabs %}

3. Check the size of the data loop file itself and note it down also.

{% tabs %}
{% tab title="Bash" %}

```bash
ls -lh /var/lib/docker/devicemapper/devicemapper/data
-rw-------. 1 root root 100G Jul 13 11:52 /var/lib/docker/devicemapper/devicemapper/data
```

{% endtab %}
{% endtabs %}

4. Backup the current worker ConfigMap – `worker-default-kubelet-config` .

{% tabs %}
{% tab title="Bash" %}

```bash
kubectl get configmap -n kube-system worker-default-kubelet-config -o yaml > worker-default-kubelet-config.yaml
```

{% endtab %}
{% endtabs %}

5. Edit the `worker-default-kubelet-config` ConfigMap, and set the following parameters for Garbage Collection (GC) and Eviction Thresholds.

{% tabs %}
{% tab title="Bash" %}

```bash
kubectl edit -n kube-system worker-default-kubelet-config
```

{% endtab %}
{% endtabs %}

{% tabs %}
{% tab title="Bash" %}

```bash
evictionHard:
  "imagefs.available": "89%" // evictionSoft - 5
evictionSoft:
  imagefs.available: "94%" // 100 - ((imagefs * 0.85) / rootdiskfs * 100)
evictionSoftGracePeriod:
  imagefs.available: "5m30s"
imageGCHighThresholdPercent: 4 // (100 - evictionSoft) - X
imageGCLowThresholdPercent: 1 // < imageGCHighThreshold
```

{% endtab %}
{% endtabs %}

6. (If necessary, should Kubelet not consume the updated configuration automatically.) Restart the Kubelet service on the worker(s).

{% tabs %}
{% tab title="Bash" %}

```bash
systemctl restart pf9-kubelet
```

{% endtab %}
{% endtabs %}

### **Troubleshooting**

#### **Scenario: Kubelet Crashed**

If Kubelet has crashed with an unexplainable stacktrace or error, it is likely that there was an error in the configuration. Take the following steps to restore the worker(s).

1. Backup the Kubelet dynamic configuration directory.

{% tabs %}
{% tab title="Bash" %}

```bash
tar -czvf /var/opt/pf9/kube/kubelet-config/dynamic-config dynamic-config-$(date +%s).tgz
```

{% endtab %}
{% endtabs %}

2. Recursively remove the directory.

{% tabs %}
{% tab title="Bash" %}

```bash
rm -rf /var/opt/pf9/kube/kubelet-config/dynamic-config
```

{% endtab %}
{% endtabs %}

3. Restart the Kubelet service.

{% tabs %}
{% tab title="Bash" %}

```bash
systemctl restart pf9-kubelet
```

{% endtab %}
{% endtabs %}

## Additional Information

* [Dynamic Kubelet Configuration](https://platform9.com/docs/kubernetes/dynamic-kubelet-configuration)
* [Device Mapper Storage Driver – Docker](https://docs.docker.com/storage/storagedriver/device-mapper-driver/)
* [Kubelet does not garbage collect docker images based on devicemapper disk usage](https://github.com/kubernetes/kubernetes/issues/60662#)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://platform9.com/kb/pmk/solutions/tunning-kubelet-gc-eviction-thresholds-devicemapper.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
