How to Proactively Monitor PLEG Health with Prometheus and Alertmanager Rules

Problem

The Pod Lifecycle Event Generator (PLEG)arrow-up-right (introduced in Kubernetes 1.2) – a component which detects changes in container states locally – needs to remain in a "Healthy" state; it is imperative to proactively monitor vitals or metrics related to its health as it relates to the state of the node (which will transition into a "NotReady" state shall PLEG be deemed "Unhealthy" which will cease scheduling).

Environment

  • Platform9 Managed Kubernetes – v5.4 and Higher

  • Prometheus Monitoring

  • Kubelet

Procedure

  1. If Prometheus Monitoring is not already enabled (it should be enabled by default, but, for older clusters this may not apply), follow the instructions in Enable In-Cluster Monitoringarrow-up-right.

  2. Export Kubeconfig.

export KUBECONFIG=~/<cluster-name>.yaml
  1. List the PrometheusRules in the pf9-monitoring namespace.

$ kubectl -n pf9-monitoring get prometheusrules.monitoring.coreos.com
NAME                      AGE
system-prometheus-rules   22m
  1. Edit the system-prometheus-rules object and add the following rules.

  1. Access the Prometheus UI via Qbert API proxy (similar to Grafana) or via kubectl port-forward associated with the prometheus-operated service exposed on TCP/9090.

  2. Navigate to the Alerts tab.

  3. Search for and/or verify from the list of alarms that the KubeletPlegDurationHigh alarm has been added and is showing green.

Last updated