Pods Stuck In the Terminating State Due to Volume Unmount Error.

Problem

  • Pods are getting stuck in Terminating state after the deletion with the below mentioned error.

E0413 09:01:03.159172    8845 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/projected/$pod_id-kube-api-access-kxdxg podName:$pod_id nodeName:}" failed. No retries permitted until 2022-04-13 09:01:03.659142773 -0500 CDT m=+783674.310558707 (durationBeforeRetry 500ms). Error: "UnmountVolume.TearDown failed for volume \"kube-api-access-kxdxg\" (UniqueName: \"kubernetes.io/projected/$pod_id-kube-api-access-kxdxg\") pod \"$pod_id\" (UID: \"$pod_id\") : unlinkat /var/lib/kubelet/pods/$pod_id/volumes/kubernetes.io~projected/kube-api-access-kxdxg: device or resource busy"

Environment

  • Platform9 Managed Kubernetes - All Versions

  • Operating System: RHEL or CentOS v7.4 Onwards

Answer

  • This has been a known issue with RHEL and CentOS systems and starting with RHEL7.4 kernel there is a new sysctl parameter available to overcome this behaviour .

  • This parameter is may_detach_mounts and its value is set to0 by default.

  • It can be enabled by executing the below mentioned command on the appropriate kubernetes cluster nodes.

# echo 1 > /proc/sys/fs/may_detach_mounts
# cat /proc/sys/fs/may_detach_mounts
1
  • The kubelet will retry to unmount the projected volumes after enabling may_detach_mounts on the node.

  • This can take a few minutes and once done the terminating pod should get deleted from the node.

Last updated