Worker Nodes in NotReady due to System OOM Encountered
Problem
Worker Nodes in NotReady due to System OOM Encountered.
Environment
- Platform9 Managed Kubernetes - All Versions
Answer
- System logs indicate Java invoked oom-killer task.
Feb 8 16:44:51 ip-10-254-13-44 systemd-udevd[380555]: buffer_head(22224:user@1000.service): Worker [348512] processing SEQNUM=54621 killed
Feb 8 17:08:56 ip-10-254-13-44 systemd-udevd[380555]: skbuff_head_cache(22264:session-547.scope): Worker [348777] processing SEQNUM=54622 is t
aking a long time
Feb 8 17:11:19 ip-10-254-13-44 kernel: [914965.895373] java invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=990
Feb 8 17:11:19 ip-10-254-13-44 kernel: [914965.895375] CPU: 2 PID: 3290434 Comm: java Not tainted 5.4.0-1041-aws #43-Ubuntu
Feb 8 17:11:19 ip-10-254-13-44 kernel: [914965.895376] Hardware name: Amazon EC2 t3.2xlarge/, BIOS 1.0 10/16/2017
Feb 8 17:11:19 ip-10-254-13-44 kernel: [914965.895377] Call Trace:
Feb 8 17:11:19 ip-10-254-13-44 kernel: [914965.895383] dump_stack+0x6d/0x8b
Feb 8 17:11:19 ip-10-254-13-44 kernel: [914965.895386] dump_header+0x4f/0x1eb
Feb 8 17:11:19 ip-10-254-13-44 kernel: [914965.895387] oom_kill_process.cold+0xb/0x10
- The load average on the system at this point is also extremely high.
2022/02/08 17:11:31 upstream: sending {"command":"cinder-driver-update","host_id":"fcf424c0-8f45-4708-bf3c-fa963d8395ef","hypervisor":"ip-10-254-13-44","task":"communion","task-forward":"livestock","timestamp":"1644340291","volume_drivers":[]}
2022/02/08 17:11:31 ip-10-254-13-44 instances:0 loadavg:730.61 proc_active:6 proc_total:5311
2022/02/08 17:11:31 meminfo
2022/02/08 17:11:31 ip-10-254-13-44 instances:0 loadavg:730.61 proc_active:6 proc_total:5311
2022/02/08 17:11:31 ip-10-254-13-44 mem:270528kB/32525432kB swap:0kB/0kB
2022/02/08 17:11:32 network
- The node recovered after sometime and transitioned back to Ready state. End user will need to work internally with their application teams to figure out the issue.
Was this page helpful?