Worker Nodes in NotReady due to System OOM Encountered
Problem
Worker Nodes in NotReady due to System OOM Encountered.
Environment
- Platform9 Managed Kubernetes - All Versions
Answer
- System logs indicate Java invoked oom-killer task.
Feb 8 16:44:51 ip-10-254-13-44 systemd-udevd[380555]: buffer_head(22224:user@1000.service): Worker [348512] processing SEQNUM=54621 killedFeb 8 17:08:56 ip-10-254-13-44 systemd-udevd[380555]: skbuff_head_cache(22264:session-547.scope): Worker [348777] processing SEQNUM=54622 is taking a long timeFeb 8 17:11:19 ip-10-254-13-44 kernel: [914965.895373] java invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=990Feb 8 17:11:19 ip-10-254-13-44 kernel: [914965.895375] CPU: 2 PID: 3290434 Comm: java Not tainted 5.4.0-1041-aws #43-UbuntuFeb 8 17:11:19 ip-10-254-13-44 kernel: [914965.895376] Hardware name: Amazon EC2 t3.2xlarge/, BIOS 1.0 10/16/2017Feb 8 17:11:19 ip-10-254-13-44 kernel: [914965.895377] Call Trace:Feb 8 17:11:19 ip-10-254-13-44 kernel: [914965.895383] dump_stack+0x6d/0x8bFeb 8 17:11:19 ip-10-254-13-44 kernel: [914965.895386] dump_header+0x4f/0x1ebFeb 8 17:11:19 ip-10-254-13-44 kernel: [914965.895387] oom_kill_process.cold+0xb/0x10- The load average on the system at this point is also extremely high.
2022/02/08 17:11:31 upstream: sending {"command":"cinder-driver-update","host_id":"fcf424c0-8f45-4708-bf3c-fa963d8395ef","hypervisor":"ip-10-254-13-44","task":"communion","task-forward":"livestock","timestamp":"1644340291","volume_drivers":[]}2022/02/08 17:11:31 ip-10-254-13-44 instances:0 loadavg:730.61 proc_active:6 proc_total:53112022/02/08 17:11:31 meminfo2022/02/08 17:11:31 ip-10-254-13-44 instances:0 loadavg:730.61 proc_active:6 proc_total:53112022/02/08 17:11:31 ip-10-254-13-44 mem:270528kB/32525432kB swap:0kB/0kB2022/02/08 17:11:32 network- The node recovered after sometime and transitioned back to Ready state. End user will need to work internally with their application teams to figure out the issue.
Was this page helpful?