This appears to be problem in combination of two factors:
- Using a virtual machine.
- A possible kernel bug.
This is partly one of the lines which describes why this happens:
Mar 7 02:43:11 myhost kernel: memcheck-amd64- invoked oom-killer: gfp_mask=0x24002c2, order=0, oom_score_adj=0
The other line is this:
Mar 7 02:43:11 myhost kernel: 0 pages HighMem/MovableOnly
|The first line is the GFP mask assigned for the allocation. It basically describes what the kernel is allowed/not allowed to do to satify this request.
The mask indicates a bunch of standard flags. The last bit, '2' however indicates the memory allocation should come from the HighMem
zone.
If you look closely at the OOM output, you'll see no HighMem/Normal
zone actually exists.
Mar 7 02:43:11 myhost kernel: Node 0 DMA: 20*4kB (UM) 17*8kB (UM) 13*16kB (M) 14*32kB (UM) 8*64kB (UM) 4*128kB (M) 4*256kB (M) 0*512kB 1*1024kB (M) 0*2048kB 0*4096kB = 3944kB
Mar 7 02:43:11 myhost kernel: Node 0 DMA32: 934*4kB (UM) 28*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3960kB
HighMem
(generally also called Normal
on x86_64) tends to map memory for zones outside of the standard 896MiB ranges directly kernel accessible on 32 bit systems. On x86_64 HighMem/Normal
seems to cover all pages above 3GiB in size.
DMA32
contains a zone used for memory that would be accessible on 32-bit DMA devices, that is you can address them with 4 byte pointers. I believe DMA
is for 16-bit DMA devices.
Generally speaking, on low memory systems Normal
wouldn't exist, given that DMA32
covers all available virtual addresses already.
The reason you OOM kill is because there is a memory allocation for a HighMem
zone with 0 pages available. Given the out of memory handler has absolutely no way to satisfy making this zone have pages to use by swapping, killing other processes or any other trick, OOM-killer just kills it.
I believe this is caused by the host VM ballooning on boot up. On KVM systems, there are two values you can set.
- The current memory.
- The available memory.
The way this works is that you can hot-add memory to your server up to the available memory. Your system however is actually given the current memory.
When a KVM VM boots up, it starts with the maximum allotment of memory possible to be given (the available memory). Gradually during the boot phase of the system KVM claws back this memory using its ballooning, leaving you instead with the current memory setting you have.
Its my belief thats what happened here.
Linode allow you to expand the memory, giving you much more at system start.
This means that there is a Normal/HighMem
zone at the beginning of the systems lifetime. When the hypervisor balloons it away, the normal zone rightly disappears from the memory manager. But, I suspect that the flag setting whether the said zone is available to allocate from is not cleared when it should. This leads the kernel to attempt to allocate from a zone that does not exist.
In terms of resolving this you have two options.
Bring this up on the kernel mailing lists to see if this really is a bug, behaviour expected or nothing at all to do with what I'm saying.
Request that linode set the 'available memory' on the system to be the same 1GiB assignment as the 'current memory'. Thus the system never balloons and never gets a Normal zone at boot, keeping the flag clear. Good luck getting them to do that!
You should be able to test that this is the case by setting up your own VM in KVM setting available to 6GiB, current to 1GiB and running your test using the same kernel to see if this behaviour you see above occurs. If it does, change the 'available' setting to equal the 1GiB current and repeat the test.
I'm making a bunch of educated guesses here and reading inbetween the lines somewhat to come up with this answer, but what I'm saying seems to fit the facts outlined already.
I suggest testing my hypothesis and letting us all know the outcome.
Best Answer
It doesn't sound like you are addressing the root cause of the issue by actually debugging why this cron job is using up so much memory.
You can try setting this option
echo 1 > /proc/sys/vm/oom_kill_allocating_task
which will tell the OOM killer to kill the process that triggered the OOM condition, but this is not guaranteed to be your cron job. You can also use "ulimit -m" in your script to set the max amount of resident memory to use. I think your best bet would be to evaluate why the cronjob is using up so much memory and if its perhaps best suited for another host or to be rewritten to consume less memory.