Addons Monitoring
Addon Monitoring
Addon rules
The Prometheus agent is configured to report numerous addon metrics. Below is the YAML-based rule set that Catapult uses, including alert names, rules with alert type and expression, timeframe, labels with type and severity, and annotations which contain the summary and description of the notification.
x
bash-5.0# cat sunpike.yml
groups
name hosts
rules
alert HostNotConverging
expr sum without (status_start_attempts) (sunpike_hosts_health clusterrole!="none" hoststate!="Ok" ) == 1
for 20m
labels
type pf9
severity high
annotations
summary Host $labels.hostname not converging
description Host $labels.hostname is not converging, status $labels.hoststate , role $labels.clusterrole , retries $labels.status_start_attempts
name clusteraddons
rules
alert AddonNotHealthy
expr sunpike_clusteraddons_health healthy!="true" == 1
for 10m
labels
type pf9
severity high
annotations
summary Addon $labels.type not healthy!!
description Addon $labels.type of cluster $labels.cluster not healthy
alert AddonNotConverging
expr sunpike_clusteraddons_health phase="" == 1
for 20m
labels
type pf9
severity high
annotations
summary Addon $labels.type not converging!!
description Addon $labels.type of cluster $labels.cluster not converging
alert AddonInstallError
expr sunpike_clusteraddons_health phase="InstallAddonError" == 1
for 5m
labels
type pf9
severity high
annotations
summary Error installing $labels.type addon!!
description Error installing Addon $labels.type on cluster $labels.cluster failed to install
alert AddonUninstallError
expr sunpike_clusteraddons_health phase="UnInstallAddonError" == 1
for 5m
labels
type pf9
severity high
annotations
summary Error uninstalling $labels.type addon!!
description Error uninstalling Addon $labels.type on cluster $labels.cluster failed to uninstall
bash-5.0#
Was this page helpful?