Nodes Fluctuating Between Ready and NotReady State Due to Kernel Memory Leak Issue

Problem

  • The Red Hat / CentOS nodes in our cluster are fluctuating between Ready and NotReady states.

  • SSH terminal is frozen when running any linux commands. Below error message is observed in the system logs when system hits this bug.

kernel: XFS: 6(238873) possible memory allocation deadlock size 144 in kmem_alloc (mode:0x8250)

Environment

  • Platform9 Managed Kubernetes - All Versions

  • Red Hat / CentOS 7 with kernel < 3.10.0-1075.el7

Answer

  • This happens due to a known Red Hat / CentOS 7.6 memory leak issue and can be resolved by adding cgroup.memory=nokmem to the GRUB configuration.

  • Please engage the System Admin for adding above parameter in GRUB. If the issue persists, please reach out to the respective community or the OS support.

Additional Information

Last updated