Metrics-server Pods Are Continuously Restarting With Probe Failures

Problem

Metrics-server pods are restarting with following errors:

metrics-server pod logs
Copy
api-server logs
Copy

Environment

  • Platform9 Edge Cloud - v5.3 and above
  • Metrics-server - v0.5.0

Cause

api-server logs shows the large context deadline exceeded which indicated the CPU resource isn't enough for the pods.

Resolution

Use the following steps to increase the requests and limits for the metrics-server container

  • Login to the DU VM and check the watch status:
Command
Copy
  • On Edit action, set watch: false
Command
Copy

When the watch is disabled you won't see the field watch under spec because it will only show if the watch is set to True.

  • Scaled down the metrics-server deployment to 0
Command
Copy
  • To increase CPU to 200M we need to tweak extra-cpu
Example
Copy

In the above example, we wanted to set 100m CPU for metrics server container we increased the extra-cpu to 10 so that CPU will become 200M. The calculation formula is [cpu+(extra-cpu*minClusterSize)]

  • Scale the metric-server pod replicas back to 1
Command
Copy
  • Verify the metrics-server pod CPU resource:
Example
Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard