Expedite Pod Scheduling On a Node That has Recovered From Disk-Pressure Eviction

Problem

Not able to schedule new Pods on a node that has recently recovered from Disk Pressure eviction for around 5 min.

Environment

Platform9 Managed Kubernetes - All Version

Procedure

There is a default transition period of 5 Minutes which controls how long the kubelet must wait before transitioning a node condition to a different state.
This transition period can be configured to a lesser value with the help of a Kubelet parameter evictionPressureTransitionPeriod
This parameter can be configured to a smaller value through Dynamic Kubelet Configuration.
To configure this for the worker nodes, edit the configmap object worker-default-kubelet-config in kube-system project to add above parameter with a smaller value.

Configure evictionPressureTransitionPeriod to 1 Min
    
​x
 
# kubectl get cm -n kube-system | grep -i workerNAME                                 DATA   AGEworker-default-kubelet-config        1      4d20h​# kubectl edit cm worker-default-kubelet-config -n kube-systemconfigmap/worker-default-kubelet-config edited​# kubectl get cm worker-default-kubelet-config -n kube-system -o yaml | grep -i evictionPressureTransitionPeriod      evictionPressureTransitionPeriod: 1m
Copy

It requires sometime to incorporate above changes in all worker nodes as during this time on each node the pf9-kubelet is restarted and each node transitions through the Ready --> NotReady,SchedulingDisabled --> NotReady --> Ready states.

Below mentioned are the verification steps to confirm if the changes have been successfully incorporated on a node or not. node.

Worker node verification
    
[root@worker1 ~]# less /var/opt/pf9/kube/kubelet-config/dynamic-config/store/checkpoints/a8427a97-1a5f-4feb-8cb4-ad04904529a5/825444/kubelet | grep -i evictionPressureTransitionPeriodevictionPressureTransitionPeriod: 1m
Copy

Additional Information

In the situations where nodes oscillate above and below a soft eviction thresholds without holding for the defined grace periods, leads to constantly switching node condition between true and false which eventually leads to bad eviction decision.
eviction-pressure-transition-period flag is used to provide protection against such unwanted node condition oscillations.

Last updated on

Was this page helpful?