How to Proactively Monitor PLEG Health with Prometheus and Alertmanager Rules

Problem

The Pod Lifecycle Event Generator (PLEG) (introduced in Kubernetes 1.2) – a component which detects changes in container states locally – needs to remain in a "Healthy" state; it is imperative to proactively monitor vitals or metrics related to its health as it relates to the state of the node (which will transition into a "NotReady" state shall PLEG be deemed "Unhealthy" which will cease scheduling).

Environment

  • Platform9 Managed Kubernetes – v5.4 and Higher
  • Prometheus Monitoring
  • Kubelet

Procedure

  1. If Prometheus Monitoring is not already enabled (it should be enabled by default, but, for older clusters this may not apply), follow the instructions in Enable In-Cluster Monitoring.
  2. Download Kubeconfig.
  3. Export Kubeconfig.
Bash
Copy
  1. List the PrometheusRules in the pf9-monitoring namespace.
Bash
Copy
  1. Edit the system-prometheus-rules object and add the following rules.
Bash
Copy
Bash
Copy
  1. Access the Prometheus UI via Qbert API proxy (similar to Grafana) or via kubectl port-forward associated with the prometheus-operated service exposed on TCP/9090.
  2. Navigate to the Alerts tab.
  3. Search for and/or verify from the list of alarms that the KubeletPlegDurationHigh alarm has been added and is showing green.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard