Instances Stop Randomly Due to Out of Memory Error

Problem

  • Multiple instances stop randomly.

  • We can see 'Out of memory' errors on the host.

  • We can see 'Instance is already powered off in the hypervisor when a stop is called' messages in /var/log/pf9/ostackhost.log on the host.

118 WARNING nova.compute.manager [req-f9189bc9-4218-494b-b794-a924fc744210 None None] [instance: baebf3af-6294-4e4c-ab9e-64f5f5ff45d7] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state: 1, current VM power_state: 4798 INFO nova.compute.manager [req-f9189bc9-4218-494b-b794-a924fc744210 None None] [instance: baebf3af-6294-4e4c-ab9e-64f5f5ff45d7] Instance is already powered off in the hypervisor when stop is called.
  • We can see below errors in /var/log/messages.log.

smc02e3b01 kernel: Out of memory: Kill process 16752 (qemu-kvm) score 702 or sacrifice childsmc02e3b01 kernel: Killed process 16752 (qemu-kvm) total-vm:30600032kB, anon-rss:24304276kB, file-rss:2000kB, shmem-rss:20kBsmc02e3b01 systemd-machined: Machine qemu-15-baebf3af-6294-4e4c-ab9e-64f5f5ff45d7 terminated

Environment

  • Platform9 Managed OpenStack - All Versions

Cause

Kernel initiated Out Of Memory Killer (OOM Killer).

OOM Killer is a process that the Linux kernel employs when the system is critically low on memory.

As there is not enough memory on the host, the kernel has initiated the OOM killer which sacrificed the qemu/kvm process and the instances on the host have stopped.

Resolution

Add more memory on the host or increase swap if physical memory/swap space is low.

Last updated