Linux – Troubleshooting oom-killer using atop: Is it fixed or not

debianlinuxoom-killer

I have a Debian system with 4GB of RAM, 2GB of swap and 2 CPUs. I've been experiencing a number of system failures involving oom-killer. The system will be working fine one minute, and the next will be inoperable with numerous oom-killer messages on the console. I have already added 1GB of RAM twice. 5

What I find puzzling is that I monitor the system using xymon, and I do not get any warning in advance. I have graphs of cpu, network, and ram utilization, but I don't see a large 'spike' in any of them that would indicate a problem. I would post them, but currently don't have enough reputation.

I discovered atop 6. I was also getting a red line through vmcom and vmlim.I changed /proc/sys/vm/overcommit_ratio from 50 to 90, and the red line went away. As you can see, I have 500MB of free ram, 2GB of swap free, 1.2G in cache.

Did I fix the problem, or did I tell atop/linux to ignore it?

I want to have a stable system. Going forward, should I:

Tune the maximum child parameters of Apache, Sendmail etc downwards? Use ulimit?
Tune oom-killer sysctl values to let me utilize all available ram & swap?
Tune swappiness or other kernel values?

I'm looking for good ways to calculate the answers to the above questions more than anything else.

Thanks.

ATOP output

ATOP - www1               2013/06/20  10:32:14               10 seconds elapsed
PRC | sys   1.34s | user   7.48s | #proc    347 | #zombie    2 | #exit     53 |
CPU | sys     11% | user     63% | irq       1% | idle    106% | wait     19% |
cpu | sys      7% | user     45% | irq       1% | idle     44% | cpu000 w  3% |
cpu | sys      4% | user     18% | irq       0% | idle     62% | cpu001 w 16% |
CPL | avg1   0.90 | avg5    1.20 | avg15   1.53 | csw    13548 | intr    5667 |
MEM | tot    3.9G | free  504.7M | cache   1.2G | buff  124.8M | slab  445.3M |
SWP | tot    2.0G | free    2.0G |              | vmcom   4.0G | vmlim   5.5G |
DSK |         sda | busy     28% | read       1 | write    237 | avio   11 ms |
NET | transport   | tcpi    1753 | tcpo    1682 | udpi    2105 | udpo    2120 |
NET | network     | ipi     3918 | ipo     3832 | ipfrw      0 | deliv   3858 |
NET | eth0     0% | pcki    1303 | pcko    1474 | si  372 Kbps | so  650 Kbps |
NET | eth1     0% | pcki    1996 | pcko    2039 | si  369 Kbps | so  398 Kbps |
NET | lo     ---- | pcki     619 | pcko     619 | si  118 Kbps | so  118 Kbps |

  PID MINFLT MAJFLT      VSTEXT  VSIZE  RSIZE  VGROW  RGROW  MEM CMD     1/3   
 3163      0      0        462K 356.1M 138.5M     0K     0K   3% named
 3256    413      0       7432K 387.4M 126.0M     0K     0K   3% mysqld
  579     18      0          3K 179.9M 73548K     0K     0K   2% spamd
  784     10      0          3K 176.8M 70436K     0K     0K   2% spamd
 8053    137      0        394K 330.2M 62928K     0K    20K   2% apache2
 7807    122      0        394K 329.4M 62064K     0K     0K   2% apache2
 7158      1      0        394K 328.3M 60004K     0K     0K   1% apache2
 8305     41      0        394K 326.9M 59096K     0K     4K   1% apache2
17712      0      0          3K 153.2M 51384K     0K     0K   1% spamd
 8057      0      0        394K 319.4M 50600K     0K     0K   1% apache2
 7994    127      0        394K 319.4M 50376K   332K   244K   1% apache2
 8068     38      0        394K 319.2M 49636K     0K     0K   1% apache2
 8164    117      0        394K 319.3M 49544K     0K   100K   1% apache2
 8286     79      0        394K 319.3M 49332K     0K     0K   1% apache2
 8393    457      0        394K 319.2M 49216K     0K    12K   1% apache2
 8222     52      0        394K 318.9M 48852K     0K    52K   1% apache2

oom-killer typical log output

Jun 15 10:21:26 mail kernel: [142707.434078] php5-cgi invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0
Jun 15 10:21:26 mail kernel: [142707.434083] Pid: 18323, comm: php5-cgi Not tainted 2.6.26-2-amd64 #1
Jun 15 10:21:26 mail kernel: [142707.434085] 
Jun 15 10:21:26 mail kernel: [142707.434085] Call Trace:
Jun 15 10:21:26 mail kernel: [142707.434101]  [<ffffffff80273994>] oom_kill_process+0x57/0x1dc
Jun 15 10:21:26 mail kernel: [142707.434104]  [<ffffffff8023b519>] __capable+0x9/0x1c
Jun 15 10:21:26 mail kernel: [142707.434106]  [<ffffffff80273cbf>] badness+0x188/0x1c7
Jun 15 10:21:26 mail kernel: [142707.434109]  [<ffffffff80273ef3>] out_of_memory+0x1f5/0x28e
Jun 15 10:21:26 mail kernel: [142707.434114]  [<ffffffff80276c44>] __alloc_pages_internal+0x31d/0x3bf
Jun 15 10:21:26 mail kernel: [142707.434119]  [<ffffffff802788fa>] __do_page_cache_readahead+0x79/0x183
Jun 15 10:21:26 mail kernel: [142707.434123]  [<ffffffff802731a9>] filemap_fault+0x15d/0x33c
Jun 15 10:21:26 mail kernel: [142707.434127]  [<ffffffff8027e728>] __do_fault+0x50/0x3e6
Jun 15 10:21:26 mail kernel: [142707.434132]  [<ffffffff80281ae9>] handle_mm_fault+0x452/0x8de
Jun 15 10:21:26 mail kernel: [142707.434136]  [<ffffffff80246229>] autoremove_wake_function+0x0/0x2e
Jun 15 10:21:26 mail kernel: [142707.434139]  [<ffffffff80284bb1>] vma_merge+0x141/0x1ee
Jun 15 10:21:26 mail kernel: [142707.434144]  [<ffffffff80221fbc>] do_page_fault+0x5d8/0x9c8
Jun 15 10:21:26 mail kernel: [142707.434149]  [<ffffffff8042aaf9>] error_exit+0x0/0x60
Jun 15 10:21:26 mail kernel: [142707.434154] 
Jun 15 10:21:26 mail kernel: [142707.434155] Mem-info:
Jun 15 10:21:26 mail kernel: [142707.434156] Node 0 DMA per-cpu:
Jun 15 10:21:26 mail kernel: [142707.434158] CPU    0: hi:    0, btch:   1 usd:   0
Jun 15 10:21:26 mail kernel: [142707.434159] CPU    1: hi:    0, btch:   1 usd:   0
Jun 15 10:21:26 mail kernel: [142707.434160] Node 0 DMA32 per-cpu:
Jun 15 10:21:26 mail kernel: [142707.434162] CPU    0: hi:  186, btch:  31 usd: 153
Jun 15 10:21:26 mail kernel: [142707.434163] CPU    1: hi:  186, btch:  31 usd: 168
Jun 15 10:21:26 mail kernel: [142707.434164] Node 0 Normal per-cpu:
Jun 15 10:21:26 mail kernel: [142707.434165] CPU    0: hi:  186, btch:  31 usd: 140
Jun 15 10:21:26 mail kernel: [142707.434167] CPU    1: hi:  186, btch:  31 usd:  81
Jun 15 10:21:26 mail kernel: [142707.434169] Active:118998 inactive:818611 dirty:1 writeback:1267 unstable:0
Jun 15 10:21:26 mail kernel: [142707.434170]  free:5922 slab:17276 mapped:191 pagetables:32145 bounce:0
Jun 15 10:21:26 mail kernel: [142707.434172] Node 0 DMA free:11692kB min:20kB low:24kB high:28kB active:0kB inactive:0kB present:10772kB pages_scanned:0 all_unreclaimable? yes
Jun 15 10:21:26 mail kernel: [142707.434175] lowmem_reserve[]: 0 3000 4010 4010
Jun 15 10:21:26 mail kernel: [142707.434177] Node 0 DMA32 free:10396kB min:6056kB low:7568kB high:9084kB active:152kB inactive:2812380kB present:3072160kB pages_scanned:64 all_unreclaimable? no
Jun 15 10:21:26 mail kernel: [142707.434180] lowmem_reserve[]: 0 0 1010 1010
Jun 15 10:21:26 mail kernel: [142707.434182] Node 0 Normal free:1600kB min:2036kB low:2544kB high:3052kB active:475840kB inactive:462064kB present:1034240kB pages_scanned:148770 all_unreclaimable? no
Jun 15 10:21:26 mail kernel: [142707.434185] lowmem_reserve[]: 0 0 0 0
Jun 15 10:21:26 mail kernel: [142707.434187] Node 0 DMA: 3*4kB 6*8kB 3*16kB 6*32kB 4*64kB 1*128kB 1*256kB 1*512kB 2*1024kB 0*2048kB 2*4096kB = 11692kB
Jun 15 10:21:26 mail kernel: [142707.434192] Node 0 DMA32: 1130*4kB 0*8kB 2*16kB 2*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 10376kB
Jun 15 10:21:26 mail kernel: [142707.434197] Node 0 Normal: 153*4kB 1*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1612kB
Jun 15 10:21:26 mail kernel: [142707.434202] 233119 total pagecache pages
Jun 15 10:21:26 mail kernel: [142707.434204] Swap cache: add 639299, delete 621816, find 41527/51044
Jun 15 10:21:26 mail kernel: [142707.434205] Free swap  = 0kB
Jun 15 10:21:26 mail kernel: [142707.434206] Total swap = 2097144kB
Jun 15 10:21:26 mail kernel: [142707.444828] 1048576 pages of RAM
Jun 15 10:21:26 mail kernel: [142707.444828] 32137 reserved pages
Jun 15 10:21:26 mail kernel: [142707.444828] 183748 pages shared
Jun 15 10:21:26 mail kernel: [142707.444828] 17483 pages swap cached
Jun 15 10:21:26 mail kernel: [142707.444828] Out of memory: kill process 3907 (perdition.imaps) score 179546 or a child
Jun 15 10:21:26 mail kernel: [142707.444828] Killed process 29401 (perdition.imaps)

References:

Best Answer

Looking at your OOM-killer output, your system is certainly not having 500 MB free RAM and empty swap:

Jun 15 10:21:26 mail kernel: [142707.434172] Node 0 DMA free:11692kB min:20kB low:24kB high:28kB active:0kB inactive:0kB present:10772kB pages_scanned:0 all_unreclaimable? yes
Jun 15 10:21:26 mail kernel: [142707.434177] Node 0 DMA32 free:10396kB min:6056kB low:7568kB high:9084kB active:152kB inactive:2812380kB present:3072160kB pages_scanned:64 all_unreclaimable? no
Jun 15 10:21:26 mail kernel: [142707.434182] Node 0 Normal free:1600kB min:2036kB low:2544kB high:3052kB active:475840kB inactive:462064kB present:1034240kB pages_scanned:148770 all_unreclaimable? no
[...]
Jun 15 10:21:26 mail kernel: [142707.434205] Free swap  = 0kB
Jun 15 10:21:26 mail kernel: [142707.434206] Total swap = 2097144kB

Note that the free memory in the "Normal" zone is below the "min" limit meaning that userland processes cannot allocate memory from it anymore:

Your DMA and DMA32 zones do have some memory available, but the OOM-killer is triggered because the request for memory came for the "HIGHMEM" (or "normal") zone (gfp_mask lower nibble is 2h)

It is quite possible that the memory usage is spiking fast enough to fit into the time interval between two queries of your monitoring system, thus you would not be able to see a spike - the system just becomes unusable.

Disabling overcommit by setting vm.overcommit_memory = 2 and/or vm.overcommit_ratio only will help matters in terms that you would not get OOM invocations any more. But the memory shortage will persist and processes asking for memory allocation upon a "memory full" condition might simply terminate exceptionally.

To really get at the situation, find out what is consuming all your memory - Apache workers are likely candidates, try enabling vm.oom_dump_tasks to get more information from oom_killer on processes and memory usage at the time of the killing decision. Also take a look at this question which your depiction resembles quite a bit.

Related Solutions

Linux – Page allocation failures on iSCSI storage

A few things you can try... but iSCSI over IPoIB sounds a little messy. Obviously, performance must matter if you're using Infiniband.

Outside of the errors, how is performance?
Is this reproducible? Can you trigger it on demand or are the messages just piling up in the dmesg ring buffer?
What filesystem are you using on the mounted iSCSI device? That may have an influence on my recommendations.

Anyway, since you're on CentOS 6.3, I'd seriously consider enabling the tuned-adm profile set. For you, if it's not already installed, run yum install tuned tuned-utils and try the "enterprise storage" profile:

tuned-adm profile enterprise-storage

That will move your I/O elevators to deadline scheduler, change the kernel.sched_min_ granularity_ns to 10ms, make a few tweaks to the vm subsystem, remove write barriers, modify the CPU governor and boost the disk read-ahead. You can also move your sysctl and sysfs settings to a custom profile.

Reverting to your original settings can be done with tuned-adm off. These commands are safe to run on the fly. Can you test and report back?

Linux OOM-kill why

Well, I think your min_free_kbytes is really high. I have a 16GB machine and my min is 67584kB.

Note that vmware's ram counts as cache, because of the mmap-ed vmem

Thats not always true. Only if the mmapped() file is opened in MAP_SHARED is that true. Else dirty pages are swap-backed. Which is the case for you it seems. If you add up the reported usage of that process given at the bottom of your output and convert it into pages (4k). It equals the RSS reported in the task dump for that process.

rss:74020kB, file-rss:3099352kB
74020 + 3099352 = 3173372
3173372 / 4 = 793343

is equal to ..

[19635]   501 19635  1693624   793343   1       0        0 vmware-vmx

As for why you OOM-kill. Well, thats a little bit more tricky.

When you reach min the kernel wants to recover memory up to high watermark bytes. The kernel thus has a check; if the amount of memory available to reclaim from the file cache will not be sufficient to put you back into the high watermark of that zone, it wont bother freeing file cache and go straight to reclaiming from anonymous memory.

We never reclaim from active. So -

if (file_inactive > zone_high - free_mem) then
   reclaim (zone_high - free_mem) file inactive pages
else
   reclaim from anonymous pool

In you're case that is 55220 is not greater than 228684-152456 (76428).

The reason this is an OOM-Kill and not swapping is because when you breach the min watermark the kernel goes into a direct_reclaim mode. In this mode, doing IO to free memory cannot be accomplished because it can cause a deadlock.

You're host would have been swapping at the time, but you're host has been allocating faster than it can swap out.

The best way to fix this would be to reduce your min watermark to something lower -- or better still get more memory and/or reduce the amount of things you run on the machine.

Best Answer

Related Solutions

Linux – Page allocation failures on iSCSI storage

Linux OOM-kill why

Related Topic