Linux – Prevent Linux’s OOM from killing apache on our webserver

apache-2.2linuxmemoryoom

We have a debian linux webserver. It's just running apache2. Our mysql server is on another host. However we sometimes run cron tasks on the webserver to do regular tasks.

However recently one of the cron tasks had a bug and started to gobble up the memory. The Linux OOM killer killed apache. Which of course brought down our web site. The memory hungry cron kept running. However in this case, I'd like the OOM killer to kill that script, and not apache.

Is there some way to configure the kernel so that I can say don't kill processes called 'apache2' (or at least make apache2 be the last thing it kills)? Both apache and the regular crons are run as the same user (www-user).

Best Answer

It doesn't sound like you are addressing the root cause of the issue by actually debugging why this cron job is using up so much memory.

You can try setting this option

echo 1 > /proc/sys/vm/oom_kill_allocating_task

which will tell the OOM killer to kill the process that triggered the OOM condition, but this is not guaranteed to be your cron job. You can also use "ulimit -m" in your script to set the max amount of resident memory to use. I think your best bet would be to evaluate why the cronjob is using up so much memory and if its perhaps best suited for another host or to be rewritten to consume less memory.

Related Solutions

Linux – OOM killer goes insane

It gets inherited from the process that spawned it. If SSH is set to -17 then Bash will be. If you restart via Bash, you'll spawn it even further.

[i-180ae177] root@migrantgeek ~ # pgrep mysqld_safe
11395
[i-180ae177] root@migrantgeek ~ # cat /proc/11395/oom_adj 
0
[i-180ae177] root@migrantgeek ~ # for pid in `pgrep bash`; do echo -17 >  /proc/$pid/oom_adj; done
[i-180ae177] root@migrantgeek ~ # /etc/init.d/mysqld  restart
Stopping MySQL:                                            [  OK  ]
Starting MySQL:                                            [  OK  ]
[i-180ae177] root@migrantgeek ~ # pgrep mysqld_safe
11523
[i-180ae177] root@migrantgeek ~ # cat /proc/11523/oom_adj 
-17

Editing the init script to change the value at the end of the startup process should fix this.

Linux – How to get the Linux OOM killer to not kill the process

This appears to be problem in combination of two factors:

Using a virtual machine.
A possible kernel bug.

This is partly one of the lines which describes why this happens:

Mar  7 02:43:11 myhost kernel: memcheck-amd64- invoked oom-killer: gfp_mask=0x24002c2, order=0, oom_score_adj=0

The other line is this:

Mar  7 02:43:11 myhost kernel: 0 pages HighMem/MovableOnly

|The first line is the GFP mask assigned for the allocation. It basically describes what the kernel is allowed/not allowed to do to satify this request. The mask indicates a bunch of standard flags. The last bit, '2' however indicates the memory allocation should come from the HighMem zone.

If you look closely at the OOM output, you'll see no HighMem/Normal zone actually exists.

Mar  7 02:43:11 myhost kernel: Node 0 DMA: 20*4kB (UM) 17*8kB (UM) 13*16kB (M) 14*32kB (UM) 8*64kB (UM) 4*128kB (M) 4*256kB (M) 0*512kB 1*1024kB (M) 0*2048kB 0*4096kB = 3944kB
Mar  7 02:43:11 myhost kernel: Node 0 DMA32: 934*4kB (UM) 28*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3960kB

HighMem (generally also called Normal on x86_64) tends to map memory for zones outside of the standard 896MiB ranges directly kernel accessible on 32 bit systems. On x86_64 HighMem/Normal seems to cover all pages above 3GiB in size.

DMA32 contains a zone used for memory that would be accessible on 32-bit DMA devices, that is you can address them with 4 byte pointers. I believe DMA is for 16-bit DMA devices.

Generally speaking, on low memory systems Normal wouldn't exist, given that DMA32 covers all available virtual addresses already.

The reason you OOM kill is because there is a memory allocation for a HighMem zone with 0 pages available. Given the out of memory handler has absolutely no way to satisfy making this zone have pages to use by swapping, killing other processes or any other trick, OOM-killer just kills it.

I believe this is caused by the host VM ballooning on boot up. On KVM systems, there are two values you can set.

The current memory.
The available memory.

The way this works is that you can hot-add memory to your server up to the available memory. Your system however is actually given the current memory.

When a KVM VM boots up, it starts with the maximum allotment of memory possible to be given (the available memory). Gradually during the boot phase of the system KVM claws back this memory using its ballooning, leaving you instead with the current memory setting you have.

Its my belief thats what happened here. Linode allow you to expand the memory, giving you much more at system start.

This means that there is a Normal/HighMem zone at the beginning of the systems lifetime. When the hypervisor balloons it away, the normal zone rightly disappears from the memory manager. But, I suspect that the flag setting whether the said zone is available to allocate from is not cleared when it should. This leads the kernel to attempt to allocate from a zone that does not exist.

In terms of resolving this you have two options.

Bring this up on the kernel mailing lists to see if this really is a bug, behaviour expected or nothing at all to do with what I'm saying.
Request that linode set the 'available memory' on the system to be the same 1GiB assignment as the 'current memory'. Thus the system never balloons and never gets a Normal zone at boot, keeping the flag clear. Good luck getting them to do that!

You should be able to test that this is the case by setting up your own VM in KVM setting available to 6GiB, current to 1GiB and running your test using the same kernel to see if this behaviour you see above occurs. If it does, change the 'available' setting to equal the 1GiB current and repeat the test.

I'm making a bunch of educated guesses here and reading inbetween the lines somewhat to come up with this answer, but what I'm saying seems to fit the facts outlined already.

I suggest testing my hypothesis and letting us all know the outcome.

Best Answer

Related Solutions

Linux – OOM killer goes insane

Linux – How to get the Linux OOM killer to not kill the process

Related Topic