Nodes Fluctuating Between Ready and NotReady State Due to Kernel Memory Leak Issue

Problem

  • The Red Hat / CentOS nodes in our cluster are fluctuating between Ready and NotReady states.
  • SSH terminal is frozen when running any linux commands. Below error message is observed in the system logs when system hits this bug.
/var/log/messages
Copy

Environment

  • Platform9 Managed Kubernetes - All Versions
  • Red Hat / CentOS 7 with kernel < 3.10.0-1075.el7

Answer

  • This happens due to a known Red Hat / CentOS 7.6 memory leak issue and can be resolved by adding cgroup.memory=nokmem to the GRUB configuration.
  • Please engage the System Admin for adding above parameter in GRUB. If the issue persists, please reach out to the respective community or the OS support.

Additional Information

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard