Linux – OOM kills a process while there is still a lot of available memory

linuxlinux-kerneloom

I am aware that there are many threads like this one, but none of them actually give a methodology on how to read an OOM message.

I have a db that is being monitored through NRPE for available memory, using a shell script check that calls free -m ... underneath grepping for available memory and checking if the threshold is fine.

Mysql was killed by OOM without that check ever being triggered. The message was:

[45948878.475805] master invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0
[45948878.483847] master cpuset=/ mems_allowed=0
[45948878.488366] CPU: 9 PID: 1095 Comm: master Not tainted 3.10.0-514.16.1.el7.x86_64 #1
[45948878.496464] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[45948878.506124]  ffff880ee42d5e20 0000000076e51669 ffff880ee4b43ae8 ffffffff81686ac3
[45948878.514480]  ffff880ee4b43b78 ffffffff81681a6e ffffffff812aedab ffff880f02e96ad0
[45948878.522887]  ffff880ee4b43bc0 ffffffff00000206 fffeefff00000000 0000000000000008
[45948878.531320] Call Trace:
[45948878.534171]  [<ffffffff81686ac3>] dump_stack+0x19/0x1b
[45948878.539728]  [<ffffffff81681a6e>] dump_header+0x8e/0x225
[45948878.545437]  [<ffffffff812aedab>] ? cred_has_capability+0x6b/0x120
[45948878.552019]  [<ffffffff81184c7e>] oom_kill_process+0x24e/0x3c0
[45948878.558306]  [<ffffffff81093c0e>] ? has_capability_noaudit+0x1e/0x30
[45948878.565052]  [<ffffffff811854b6>] out_of_memory+0x4b6/0x4f0
[45948878.571022]  [<ffffffff81682577>] __alloc_pages_slowpath+0x5d7/0x725
[45948878.577769]  [<ffffffff8118b5c5>] __alloc_pages_nodemask+0x405/0x420
[45948878.584562]  [<ffffffff8108353d>] copy_process+0x1dd/0x1960
[45948878.590536]  [<ffffffff81084e71>] do_fork+0x91/0x2c0
[45948878.595893]  [<ffffffff81085126>] SyS_clone+0x16/0x20
[45948878.601405]  [<ffffffff816974d9>] stub_clone+0x69/0x90
[45948878.606940]  [<ffffffff81697189>] ? system_call_fastpath+0x16/0x1b
[45948878.613511] Mem-Info:
[45948878.616212] active_anon:14681394 inactive_anon:285983 isolated_anon:92
 active_file:2380 inactive_file:3203 isolated_file:76
 unevictable:0 dirty:0 writeback:0 unstable:0
 slab_reclaimable:94372 slab_unreclaimable:33256
 mapped:11621 shmem:768108 pagetables:32080 bounce:0
 free:93898 free_pcp:64 free_cma:0
[45948878.652587] Node 0 DMA free:15908kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[45948878.694520] lowmem_reserve[]: 0 2977 60382 60382
[45948878.700271] Node 0 DMA32 free:230644kB min:3328kB low:4160kB high:4992kB active_anon:2476892kB inactive_anon:24492kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129332kB managed:3049456kB mlocked:0kB dirty:0kB writeback:0kB mapped:156kB shmem:72884kB slab_reclaimable:156940kB slab_unreclaimable:34796kB kernel_stack:19120kB pagetables:5252kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[45948878.746051] lowmem_reserve[]: 0 0 57404 57404
[45948878.751495] Node 0 Normal free:63356kB min:64236kB low:80292kB high:96352kB active_anon:56248684kB inactive_anon:1119440kB active_file:11540kB inactive_file:19792kB unevictable:0kB isolated(anon):368kB isolated(file):304kB present:59768832kB managed:58782432kB mlocked:0kB dirty:0kB writeback:0kB mapped:46328kB shmem:2999548kB slab_reclaimable:220548kB slab_unreclaimable:98228kB kernel_stack:25488kB pagetables:123068kB unstable:0kB bounce:0kB free_pcp:5208kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:5126 all_unreclaimable? no
[45948878.800106] lowmem_reserve[]: 0 0 0 0
[45948878.804880] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
[45948878.821646] Node 0 DMA32: 2212*4kB (UEM) 1679*8kB (UEM) 845*16kB (UEM) 1332*32kB (UEM) 648*64kB (UEM) 330*128kB (UEM) 192*256kB (UEM) 38*512kB (UM) 0*1024kB 0*2048kB 0*4096kB = 230744kB
[45948878.841497] Node 0 Normal: 16437*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 65748kB
[45948878.855460] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[45948878.864552] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[45948878.873384] 773511 total pagecache pages
[45948878.877730] 0 pages in swap cache
[45948878.881437] Swap cache stats: add 0, delete 0, find 0/0
[45948878.887060] Free swap  = 0kB
[45948878.890329] Total swap = 0kB
[45948878.893592] 15728539 pages RAM
[45948878.897028] 0 pages HighMem/MovableOnly
[45948878.901251] 266590 pages reserved

Am I monitoring memory usage incorrectly? What does this message say? I don't know how to interpret the values for memory. What can I be checking on the system to prevent this from happening again?

Best Answer

Usually your monitoring interval is way too long to catch issues where a process quickly grabs all available memory. Agressive configured Icinga or Nagios checks every Minute. So on average there is 30 seconds between a problem occuring and the check running. In the special case of OOM the OOM itself will probably defer the check for a few seconds and once the check is finally executed OOM already happened and memory has been free'd, so the check will return OK.

If you keep graphs of your monitoring results it might be a good idea to take a look at these.

To fix the root-cause of your issue you should probably take a look at the following article: How does vm.overcommit_memory work?

Related Solutions

Linux – Page allocation failures on iSCSI storage

A few things you can try... but iSCSI over IPoIB sounds a little messy. Obviously, performance must matter if you're using Infiniband.

Outside of the errors, how is performance?
Is this reproducible? Can you trigger it on demand or are the messages just piling up in the dmesg ring buffer?
What filesystem are you using on the mounted iSCSI device? That may have an influence on my recommendations.

Anyway, since you're on CentOS 6.3, I'd seriously consider enabling the tuned-adm profile set. For you, if it's not already installed, run yum install tuned tuned-utils and try the "enterprise storage" profile:

tuned-adm profile enterprise-storage

That will move your I/O elevators to deadline scheduler, change the kernel.sched_min_ granularity_ns to 10ms, make a few tweaks to the vm subsystem, remove write barriers, modify the CPU governor and boost the disk read-ahead. You can also move your sysctl and sysfs settings to a custom profile.

Reverting to your original settings can be done with tuned-adm off. These commands are safe to run on the fly. Can you test and report back?

OOM Error – Troubleshooting OOM Despite Available Memory (Cache)

It appears you (and I in a case with very similar symptoms) have truly run out of memory and have been confused by the cached number.

There apparently are cases when Linux not freeing large disk cache when memory demand goes up

In particular (I don't really understand why), postgres' shared_buffers may be reported under "Cached" (the page cache). In your case the 6481852k cached in top matches this line in the OOM-killer's log:

Jun 10 05:45:25 db kernel: [11209156.840788] 1615243 total pagecache pages

(1615243*4KB ~= 6481852k) - meaning the page cache indeed was not dropped before invoking OOM-killer.

Yet there are few file-backed pages (I assume active_file:98 inactive_file:168 is similar to /proc/meminfo's Active(file)/Inactive(file)), so it's not the discardable pages we know and love.

The post at https://www.depesz.com/2012/06/09/how-much-ram-is-postgresql-using/ demonstrates an example session where shutting down postgres leads to reduction of "cached" by the size of shared_buffers (scroll to "And most of it came off disk cache – as expected, because it was used for shared_buffers.") - unfortunately it doesn't indicate the version of postgres nor the kernel that was used for the experiment.

I'm using 3.13.0-67 x86_64 with PG 9.3. In 9.3 they switched from using Sys V shared memory (shmget) to anonymous mmap(...R+W, MAP_SHARED|MAP_ANONYMOUS|MAP_HASSEMAPHORE...)+fork() (in 9.4 this became configurable via dynamic_shared_memory_type). But I couldn't find any explanations as to why these mmap()s are supposed to show up in "cached" and why, only https://access.redhat.com/solutions/406773 that says "Cached: Memory in the pagecache (Diskcache and Shared Memory)"

Given that there's many kinds of shared memory I'm both enlightened and confused...

Best Answer

Related Solutions

Linux – Page allocation failures on iSCSI storage

OOM Error – Troubleshooting OOM Despite Available Memory (Cache)

Related Topic