Kubernetes API Monitoring
Kubernetes API Monitoring Rules
The Prometheus agent is configured to report numerous K8s metrics. Below is the YAML-based rule set that Catapult uses, including alert names, expression, timeframe, labels, and annotations which contain the description and summaries.
x
bash-5.0# cat k8s.ymlgroupsnamekube-apiserver rules#----------------------------------- API server -------------------------------------alertKubeAPIServerDown exprupjob="kube-apiserver" offset 5m == 0 for1m labels severitycritical typek8s annotations descriptionKubernetes API server down on cluster $labels.cluster summary"Cluster {{ $labels.cluster }}: Kube api server down on host {{ $labels.host }}"alertKubernetesApiServerErrors exprsum(rate(apiserver_request_totalcode=~"(4|5).."1m offset 5m )) by (cluster, host) / sum(rate(apiserver_request_total1m offset 5m )) by (cluster, host) * 100 > 3 for2m labels severitycritical typek8s annotations summaryKubernetes API server errors on cluster $labels.cluster description"Cluster {{ $labels.cluster }}: Kubernetes API server is experiencing high error rate on host {{ $labels.host }}"alertKubernetesApiClientErrors expr(sum(rate(rest_client_requests_totalcode=~"(4|5).."1m offset 5m )) by (cluster, host) / sum(rate(rest_client_requests_total1m offset 5m )) by (cluster, host)) * 100 > 1 for2m labels severitycritical typek8s annotations summaryKubernetes API client errors on cluster $labels.cluster description"Cluster {{ $labels.cluster }}: Kubernetes API client is experiencing high error rate on host {{ $labels.host }}"#----------------------------------- Scheduler -------------------------------------alertKubeSchedulerDown exprupjob="kube-scheduler" offset 5m == 0 for1m labels severitycritical typek8s annotations summaryKubernetes Scheduler down on cluster $labels.cluster description"Cluster {{ $labels.cluster }}: Kube scheduler down on host {{ $labels.host }}"#----------------------------------- Controller -------------------------------------alertKubeControllerManagerDown exprupjob="kube-controller" offset 5m == 0 for1m labels severitycritical typek8s annotations summaryKubernetes Controller down on cluster $labels.cluster description"Cluster {{ $labels.cluster }}: Kube controller down on host {{ $labels.host }}"#----------------------------------- KubeProxy -------------------------------------alertKubeProxyDown exprupjob="kube-proxy" offset 5m == 0 for1m labels severitycritical typek8s annotations summaryKubeProxy down on cluster $labels.cluster description"Cluster {{ $labels.cluster }}: kube proxy down on host {{ $labels.host }}"alertKubeProxyRuleSyncLatency exprhistogram_quantile(0.99, sum by(le, cluster, host) (rate(kubeproxy_sync_proxy_rules_duration_seconds_bucket5m offset 5m ))) > 60 for2m labels severitywarning typek8s annotations summaryCluster $labels.cluster is taking too long, on average, to apply kubernetes service rules to iptables. description"Cluster {{ $labels.cluster }}: network rules synchronization slowing down, VALUE = {{ $value }} on host {{ $labels.host }}"bash-5.0#Was this page helpful?