# How to Proactively Monitor PLEG Health with Prometheus and Alertmanager Rules

## Problem

The [**Pod Lifecycle Event Generator (PLEG)**](https://github.com/kubernetes/kubernetes/blob/release-1.2/docs/proposals/pod-lifecycle-event-generator.md) (introduced in Kubernetes 1.2) – a component which detects changes in container states locally – needs to remain in a "Healthy" state; it is imperative to proactively monitor vitals or metrics related to its health as it relates to the state of the node (which will transition into a "NotReady" state shall PLEG be deemed "Unhealthy" which will cease scheduling).

## Environment

* Platform9 Managed Kubernetes – v5.4 and Higher
* Prometheus Monitoring
* Kubelet

## Procedure

1. If **Prometheus Monitoring** is not already enabled (it should be enabled by default, but, for older clusters this may not apply), follow the instructions in [**Enable In-Cluster Monitoring**](https://platform9.com/docs/kubernetes/enabling-in-cluster-monitoring#enable-monitoring-post-cluster-creation).
2. [Download Kubeconfig](https://platform9.com/docs/kubernetes/kubeconfig-and-clients-download-kubeconfig-from-ui).
3. Export Kubeconfig.

{% tabs %}
{% tab title="Bash" %}

```bash
export KUBECONFIG=~/<cluster-name>.yaml
```

{% endtab %}
{% endtabs %}

4. List the `PrometheusRules` in the `pf9-monitoring` namespace.

{% tabs %}
{% tab title="Bash" %}

```bash
$ kubectl -n pf9-monitoring get prometheusrules.monitoring.coreos.com
NAME                      AGE
system-prometheus-rules   22m
```

{% endtab %}
{% endtabs %}

5. Edit the `system-prometheus-rules` object and add the following rules.

{% tabs %}
{% tab title="Bash" %}

```bash
$ kubectl edit -n pf9-monitoring prometheusrules.monitoring.coreos.com system-prometheus-rules
```

{% endtab %}
{% endtabs %}

{% tabs %}
{% tab title="Bash" %}

```bash
spec:
  groups:
  - name: kube-events
    rules:
    - alert: KubeletPlegDurationHigh
      annotations:
        message: 'The Kubelet Pod Lifecycle Event Generator has a 99th percentile
          duration of {{ $value }} seconds on node {{ $labels.node }}.'
      expr: |
        node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile{quantile="0.99"} >= 10
      for: 5m
      labels:
        severity: warning
        
[...]
        
  - name: kubelet.rules
    rules:
    - expr: |
        histogram_quantile(0.99, sum(rate(kubelet_pleg_relist_duration_seconds_bucket[5m])) by (cluster, instance, le) * on(cluster, instance) group_left(node) kubelet_node_name{job="kubelet", metrics_path="/metrics"})
      labels:
        quantile: "0.99"
      record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile
```

{% endtab %}
{% endtabs %}

6. Access the Prometheus UI via Qbert API proxy (similar to Grafana) or via `kubectl port-forward` associated with the `prometheus-operated` service exposed on TCP/9090.
7. Navigate to the Alerts tab.
8. Search for and/or verify from the list of alarms that the `KubeletPlegDurationHigh` alarm has been added and is showing green.

<figure><img src="https://978681485-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzyQCMA9ICq40g4F8hWJi%2Fuploads%2Fgit-blob-e54a2c441e4085956ca84865e4acbcc442aa7ce0%2Ful2i74h7zgtss3hl2d4h75icuseqv3cu262h3hq8zfsp0xeba4ilptdt2bophv7q.png?alt=media" alt=""><figcaption></figcaption></figure>
