Configuring Persistent Storage

Platform9 Monitoring deploys Prometheus, Alertmanager and Grafana in a single click on all clusters, this deployment leverages ephemeral storage. To configure Platform9 Monitoring to use persistent storage a storage class must be added to the cluster and the monitoring deployment updated to consume the storage class using Kubectl.

To enable persistent storage you must have a Storage Class configured and able to provision persistent volume claims.

Add a Storage Class to Prometheus

The first step is to setup a storage class, if your cluster is running without storage follow the guide to setup the PortWorx CSI.

Once you have a storage class configured run the Kubectl command below to edit the deployment:

Bash
    
 
kubectl -n pf9-monitoring edit prometheus system
Copy

Editing the running configuration uses the linux command line text editor Vi. For help with Vi view this guide.

The default configuration is below, this configuration needs to be updated with a valid storage specification.

Bash
    
 
# Please edit the object below. Lines beginning with a '#' will be ignored,# and an empty file will abort the edit. If an error occurs while saving this file will be# reopened with the relevant failures.#apiVersion: monitoring.coreos.com/v1kind: Prometheusmetadata:  creationTimestamp: "2021-01-15T18:09:32Z"  generation: 1  managedFields:  - apiVersion: monitoring.coreos.com/v1    fieldsType: FieldsV1    fieldsV1:      f:metadata:        f:ownerReferences: {}      f:spec:        .: {}        f:additionalScrapeConfigs:          .: {}          f:key: {}          f:name: {}        f:alerting:          .: {}          f:alertmanagers: {}        f:replicas: {}        f:resources:          .: {}          f:requests:            .: {}            f:cpu: {}            f:memory: {}        f:retention: {}        f:ruleSelector:          .: {}          f:matchLabels:            .: {}            f:prometheus: {}            f:role: {}        f:rules:          .: {}          f:alert: {}        f:scrapeInterval: {}        f:serviceAccountName: {}        f:serviceMonitorSelector:          .: {}          f:matchLabels:            .: {}            f:prometheus: {}            f:role: {}    manager: promplus    operation: Update    time: "2021-01-15T18:09:32Z"  name: system  namespace: pf9-monitoring  ownerReferences:  - apiVersion: apps/v1    blockOwnerDeletion: false    controller: false    kind: Deployment    name: monhelper    uid: cbc48a82-3c1f-4a2b-9b2a-ebbc32ae2e65  resourceVersion: "2733"  selfLink: /apis/monitoring.coreos.com/v1/namespaces/pf9-monitoring/prometheuses/system  uid: c1722922-4973-4973-8e29-ba0269ad9a79spec:  additionalScrapeConfigs:    key: additional-scrape-config.yaml    name: scrapeconfig  alerting:    alertmanagers:    - name: sys-alertmanager      namespace: pf9-monitoring      port: web  replicas: 1  resources:    requests:      cpu: 500m      memory: 512Mi  retention: 7d  ruleSelector:    matchLabels:      prometheus: system      role: alert-rules  rules:    alert: {}  scrapeInterval: 2m  serviceAccountName: system-prometheus  serviceMonitorSelector:    matchLabels:      prometheus: system      role: service-monitor
Copy

The deployment needs to have the following storage section added. The storage class name must be updated to match your cluster and the amount of storage should also be specified.

The storage class in this example is running on Portworx Storage, to add Portworx see the PortWorx CSI guide.

Bash
    
 
storage:    volumeClaimTemplate:      spec:        accessModes:        - ReadWriteOnce        resources:          requests:            storage: 5Gi        storageClassName: portworx-csi-sc
Copy

The final configuration should match the configuration below.

Bash
    
 
# Please edit the object below. Lines beginning with a '#' will be ignored,# and an empty file will abort the edit. If an error occurs while saving this file will be# reopened with the relevant failures.#apiVersion: monitoring.coreos.com/v1kind: Prometheusmetadata:  creationTimestamp: "2021-02-23T02:06:49Z"  generation: 2  managedFields:  - apiVersion: monitoring.coreos.com/v1    fieldsType: FieldsV1    fieldsV1:      f:metadata:        f:ownerReferences: {}      f:spec:        .: {}        f:additionalScrapeConfigs:          .: {}          f:key: {}          f:name: {}        f:alerting:          .: {}          f:alertmanagers: {}        f:replicas: {}        f:resources:          .: {}          f:requests:            .: {}            f:cpu: {}            f:memory: {}        f:retention: {}        f:ruleSelector:          .: {}          f:matchLabels:            .: {}            f:prometheus: {}            f:role: {}        f:rules:          .: {}          f:alert: {}        f:scrapeInterval: {}        f:serviceAccountName: {}        f:serviceMonitorSelector:          .: {}          f:matchLabels:            .: {}            f:prometheus: {}            f:role: {}    manager: promplus    operation: Update    time: "2021-02-23T02:06:49Z"  - apiVersion: monitoring.coreos.com/v1    fieldsType: FieldsV1    fieldsV1:      f:spec:        f:storage:          .: {}          f:volumeClaimTemplate:            .: {}            f:spec:              .: {}              f:accessModes: {}              f:resources:                .: {}                f:requests:                  .: {}                  f:storage: {}              f:storageClassName: {}    manager: kubectl    operation: Update    time: "2021-02-23T03:44:02Z"  name: system  namespace: pf9-monitoring  ownerReferences:  - apiVersion: apps/v1    blockOwnerDeletion: false    controller: false    kind: Deployment    name: monhelper    uid: 9dd25109-bef1-4510-b261-16dd7a62d4bd  resourceVersion: "169910"  selfLink: /apis/monitoring.coreos.com/v1/namespaces/pf9-monitoring/prometheuses/system  uid: ce60b009-cb28-4ba9-9db3-9745d14bf267spec:  additionalScrapeConfigs:    key: additional-scrape-config.yaml    name: scrapeconfig  alerting:    alertmanagers:    - name: sys-alertmanager      namespace: pf9-monitoring      port: web  replicas: 1  resources:    requests:      cpu: 500m      memory: 512Mi  retention: 7d  ruleSelector:    matchLabels:      prometheus: system      role: alert-rules  rules:    alert: {}  scrapeInterval: 2m  serviceAccountName: system-prometheus  serviceMonitorSelector:    matchLabels:      prometheus: system      role: service-monitor  storage:    volumeClaimTemplate:      spec:        accessModes:        - ReadWriteOnce        resources:          requests:            storage: 5Gi        storageClassName: portworx-csi-sc
Copy

Troubleshooting

To see if the deployment is healthy run kubectl -n pf9-monitoring get allThe resulting output should show all services in a running state. If any pods or services are in a creating state rerun the command again.``

If there is an issue the prometheus-system-0 pods will fail to start or enter crashLoopBackoff.

Bash
    
​x
 
kubectl -n pf9-monitoring get allNAME                                      READY   STATUS              RESTARTS   AGEpod/alertmanager-sysalert-0               2/2     Running             0          9spod/grafana-695dccdd85-97gwb              0/2     ContainerCreating   0          4spod/kube-state-metrics-68dfc664dc-4hgt8   1/1     Running             0          2m12spod/node-exporter-857v7                   1/1     Running             0          114spod/node-exporter-98zch                   1/1     Running             0          114spod/node-exporter-jkv77                   1/1     Running             0          114spod/node-exporter-qq9xf                   1/1     Running             0          114spod/prometheus-system-0                   3/3     Running             0          9s​NAME                            TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                      AGEservice/alertmanager-operated   ClusterIP   None          <none>        9093/TCP,9094/TCP,9094/UDP   9sservice/grafana-ui              ClusterIP   10.21.1.130   <none>        80/TCP                       4sservice/kube-state-metrics      ClusterIP   None          <none>        8443/TCP,8081/TCP            2m21sservice/node-exporter           ClusterIP   None          <none>        9100/TCP                     118sservice/prometheus-operated     ClusterIP   None          <none>        9090/TCP                     9sservice/sys-alertmanager        ClusterIP   10.21.1.92    <none>        9093/TCP                     9sservice/sys-prometheus          ClusterIP   10.21.2.148   <none>        9090/TCP                     9s​NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGEdaemonset.apps/node-exporter   4         4         4       4            4           kubernetes.io/os=linux   114s​NAME                                 READY   UP-TO-DATE   AVAILABLE   AGEdeployment.apps/grafana              0/1     1            0           4sdeployment.apps/kube-state-metrics   1/1     1            1           2m12s​NAME                                            DESIRED   CURRENT   READY   AGEreplicaset.apps/grafana-695dccdd85              1         1         0       4sreplicaset.apps/kube-state-metrics-68dfc664dc   1         1         1       2m12s​NAME                                     READY   AGEstatefulset.apps/alertmanager-sysalert   1/1     9sstatefulset.apps/prometheus-system       1/1     9s
Copy

Get Monitoring Pod Status

Run kubectl -n pf9-monitoring describe pod prometheus-system-0and review the events output. The output will show any errors impacting the Pod state. For example, prometheus is failing to start because the PVC cannot be found. To solve this issue the PVC must be manually recreated using Kubectl to apply the Solution example.

Error: PVC cannot be found
Solution: Create PVC YAML
    
Events:  Type     Reason            Age        From               Message  ----     ------            ----       ----               -------  Warning  FailedScheduling  <unknown>  default-scheduler  persistentvolumeclaim "prometheus-system-db-prometheus-system-0" is being deleted  Warning  FailedScheduling  <unknown>  default-scheduler  persistentvolumeclaim "prometheus-system-db-prometheus-system-0" not found  Warning  FailedScheduling  <unknown>  default-scheduler  persistentvolumeclaim "prometheus-system-db-prometheus-system-0" not found
Copy

View Prometheus Container Logs

If the Pod events do not indicate that the issue is within Kubernetes itself it can be useful to look at the container logs for the Prometheus logs. To do this from the Platfrom9 SaaS Management Plane navigate to the Workloads dashboard and select the Pods tab. Filter the table to your cluster and set the namespace to pf9-monitoring. Once the table updates click the view logs link for the prometheus-system-0 container. This will open the container logs in a new tab within your browser.

Below is an example permissions error preventing the Pod from starting on each node.

Error Example
    
level=info ts=2021-02-23T06:26:49.738Z caller=main.go:331 msg="Starting Prometheus" version="(version=2.16.0, branch=HEAD, revision=b90be6f32a33c03163d700e1452b54454ddce0ec)"level=info ts=2021-02-23T06:26:49.738Z caller=main.go:332 build_context="(go=go1.13.8, user=root@7ea0ae865f12, date=20200213-23:50:02)"level=info ts=2021-02-23T06:26:49.738Z caller=main.go:333 host_details="(Linux 4.15.0-135-generic #139-Ubuntu SMP Mon Jan 18 17:38:24 UTC 2021 x86_64 prometheus-system-0 (none))"level=info ts=2021-02-23T06:26:49.738Z caller=main.go:334 fd_limits="(soft=1048576, hard=1048576)"level=info ts=2021-02-23T06:26:49.738Z caller=main.go:335 vm_limits="(soft=unlimited, hard=unlimited)"level=error ts=2021-02-23T06:26:49.739Z caller=query_logger.go:87 component=activeQueryTracker msg="Error opening query log file" file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied"panic: Unable to create mmap-ed active query log​goroutine 1 [running]:github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x7ffca7417a5a, 0xb, 0x14, 0x2c90040, 0xc0006a0510, 0x2c90040)  /app/promql/query_logger.go:117 +0x4cdmain.main()  /app/cmd/prometheus/main.go:362 +0x5243
Copy

Incorrect Storage Class Name

If you incorrectly specify the storage class name you will need first update the prometheus configuration and then delete the persistent volume claim by running: kubectl delete pvc <pvc-name> -n pf9-monitoring

Once the PVC is deleted the the Pods will start up and claim a new PVC.

Last updated on Apr 1, 2022

Was this page helpful?