Worker Nodes in NotReady due to System OOM Encountered

Problem

Worker Nodes in NotReady due to System OOM Encountered.

Environment

  • Platform9 Managed Kubernetes - All Versions

Answer

  • System logs indicate Java invoked oom-killer task.

Feb  8 16:44:51 ip-10-254-13-44 systemd-udevd[380555]: buffer_head(22224:user@1000.service): Worker [348512] processing SEQNUM=54621 killed
Feb  8 17:08:56 ip-10-254-13-44 systemd-udevd[380555]: skbuff_head_cache(22264:session-547.scope): Worker [348777] processing SEQNUM=54622 is t
aking a long time
Feb  8 17:11:19 ip-10-254-13-44 kernel: [914965.895373] java invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=990
Feb  8 17:11:19 ip-10-254-13-44 kernel: [914965.895375] CPU: 2 PID: 3290434 Comm: java Not tainted 5.4.0-1041-aws #43-Ubuntu
Feb  8 17:11:19 ip-10-254-13-44 kernel: [914965.895376] Hardware name: Amazon EC2 t3.2xlarge/, BIOS 1.0 10/16/2017
Feb  8 17:11:19 ip-10-254-13-44 kernel: [914965.895377] Call Trace:
Feb  8 17:11:19 ip-10-254-13-44 kernel: [914965.895383]  dump_stack+0x6d/0x8b
Feb  8 17:11:19 ip-10-254-13-44 kernel: [914965.895386]  dump_header+0x4f/0x1eb
Feb  8 17:11:19 ip-10-254-13-44 kernel: [914965.895387]  oom_kill_process.cold+0xb/0x10
  • The load average on the system at this point is also extremely high.

  • The node recovered after sometime and transitioned back to Ready state. End user will need to work internally with their application teams to figure out the issue.

Last updated