Expedite Pod Scheduling On a Node That has Recovered From Disk-Pressure Eviction

Problem

Not able to schedule new Pods on a node that has recently recovered from Disk Pressure eviction for around 5 min.

Environment

  • Platform9 Managed Kubernetes - All Version

Procedure

  1. There is a default transition period of 5 Minutes which controls how long the kubelet must wait before transitioning a node condition to a different state.
  2. This transition period can be configured to a lesser value with the help of a Kubelet parameter evictionPressureTransitionPeriod
  3. This parameter can be configured to a smaller value through Dynamic Kubelet Configuration.
  4. To configure this for the worker nodes, edit the configmap object worker-default-kubelet-config in kube-system project to add above parameter with a smaller value.
Configure evictionPressureTransitionPeriod to 1 Min
Copy

It requires sometime to incorporate above changes in all worker nodes as during this time on each node the pf9-kubelet is restarted and each node transitions through the Ready --> NotReady,SchedulingDisabled --> NotReady --> Ready states.

Below mentioned are the verification steps to confirm if the changes have been successfully incorporated on a node or not. node.

Worker node verification
Copy

Additional Information

  • In the situations where nodes oscillate above and below a soft eviction thresholds without holding for the defined grace periods, leads to constantly switching node condition between true and false which eventually leads to bad eviction decision.
  • eviction-pressure-transition-period flag is used to provide protection against such unwanted node condition oscillations.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard