Metrics-server Pods Are Continuously Restarting With Probe Failures
Problem
Metrics-server pods are restarting with following errors:
shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
timeout waiting for SETTINGS frames from 10.69.69.198:22639
writers.go:117] apiserver was unable to write a JSON response: http: Handler timeout
status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
writers.go:117] apiserver was unable to write a JSON response: http: Handler timeout
writers.go:130] apiserver was unable to write a fallback JSON response: http: Handler timeout
writers.go:117] apiserver was unable to write a JSON response: http: Handler timeout
wrap.go:54] timeout or abort while handling: GET "/apis/metrics.k8s.io/v1beta1"
status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
controller.go:129] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.\n","stream":"stderr","time":"2022-06-23T12:37:12.03602478Z"}
available_controller.go:508] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.21.157.51:443/apis/metrics.k8s.io/v1beta1: Get \"https://10.21.157.51:443/apis/metrics.k8s.io/v1beta1\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\n","stream":"stderr","time":"2022-06-23T12:37:16.052335263Z"}
handler_proxy.go:102] no RequestInfo found in the context\n","stream":"stderr","time":"2022-06-23T12:37:17.053434398Z"}
controller.go:116] loading OpenAPI spec for \"v1beta1.metrics.k8s.io\" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable\n","stream":"stderr","time":"2022-06-23T12:37:17.053516683Z"}
{"log":", Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]\n","stream":"stderr","time":"2022-06-23T12:37:17.053519758Z"}
Environment
- Platform9 Edge Cloud - v5.3 and above
- Metrics-server - v0.5.0
Cause
api-server
logs shows the large context deadline exceeded which indicated the CPU resource isn't enough for the pods.
Resolution
Use the following steps to increase the requests and limits for the metrics-server
container
- Login to the DU VM and check the watch status:
# /opt/pf9/qbert/bin/kubectl get clusteraddons <CLUSTERUUID>-metrics-server --kubeconfig='/etc/sunpike/kubeconfig' -o yaml | grep watch
f:watch
watchtrue
- On Edit action, set
watch: false
spec
clusterID 8dcfa1b7-366e-4a6a-aa07-6dcbb773bde9
override
params
name metricsMemoryLimit
value 300Mi
name metricsCpuLimit
value 100m
type metrics-server
version0.5.0
watchfalse
When the watch is disabled you won't see the field watch
under spec because it will only show if the watch is set to True
.
- Scaled down the metrics-server deployment to 0
# kubectl scale deployment --replicas=0 metrics-server-v0.5.0 -n kube-system
- To increase CPU to 200M we need to tweak
extra-cpu
Command
/pod_nanny
--cpu=40m
--extra-cpu=10m <<------
--minClusterSize=16
In the above example, we wanted to set 100m CPU for metrics server container we increased the extra-cpu
to 10 so that CPU will become 200M. The calculation formula is [cpu+(extra-cpu*minClusterSize)
]
- Scale the
metric-server
pod replicas back to1
# kubectl scale deployment --replicas=1 metrics-server-v0.5.0 -n kube-system
- Verify the
metrics-server
pod CPU resource:
-----------------------
Limits
cpu 200m
memory 104Mi
Requests
cpu 200m
memory 104Mi
-----------------------
Was this page helpful?