Linux – Why is OOM killer hitting here

linuxoomxen

We have some VMs running kernel 2.26.26-2 under xen 4.0.3 with different CPU and RAM amounts and on some seeing almost random OOM events each rendering the VM unusable. Output is as follows, I have the following questions:

  1. What has _cpu_down to do here? I see this in common on all affected machines but none of them should call this in my opinion (belongs to the cpu-hotplug system I think)
  2. Does anyone see something spooky in the memory printout? The available values for HighMem and Normal are almost the same on every machine with this problem.
  3. Am I right if I read the line HighMem free:496kB min:512kB low:9336kB as the source of the OOM since its free value dropped below min?
  4. Any suggestions on how to track down the problem? I monitor the memory usage on all machines but none shows a drop in free memory around the time of the OOM (monitor intervall is 5 min so I might be missing the exciting parts).
Jul  9 14:01:36 www23 kernel: [990735.330526] php-cgi invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0
Jul  9 14:01:42 www23 kernel: [990735.330544] Pid: 7749, comm: php-cgi Not tainted 2.6.26-2-xen-686 #1
Jul  9 14:04:20 www23 kernel: [990735.330555]  [<c0150df2>] oom_kill_process+0x4f/0x1bb
Jul  9 14:04:20 www23 kernel: [990735.330571]  [<c0125134>] __capable+0x8/0x1b
Jul  9 14:04:58 www23 kernel: [990735.330582]  [<c0151293>] out_of_memory+0x14e/0x17f
Jul  9 14:04:58 www23 kernel: [990735.330591]  [<c01532e1>] __alloc_pages_internal+0x2b8/0x34e
Jul  9 14:04:58 www23 kernel: [990735.330603]  [<c0153383>] __alloc_pages+0x7/0x9
Jul  9 14:04:58 www23 kernel: [990735.330617]  [<c0154e7c>] __do_page_cache_readahead+0x86/0x163
Jul  9 14:04:58 www23 kernel: [990735.330631]  [<c01552ae>] do_page_cache_readahead+0x3d/0x47
Jul  9 14:04:58 www23 kernel: [990735.330646]  [<c0150692>] filemap_fault+0x164/0x35a
Jul  9 14:04:58 www23 kernel: [990735.330658]  [<c015a5a5>] __do_fault+0x43/0x4c9
Jul  9 14:04:58 www23 kernel: [990735.330677]  [<c015ec3b>] handle_mm_fault+0x63c/0xda5
Jul  9 14:04:58 www23 kernel: [990735.330688]  [<c01e0aaf>] rb_erase+0x176/0x22f
Jul  9 14:04:58 www23 kernel: [990735.330702]  [<c010eaf3>] do_page_fault+0x605/0xb2e
Jul  9 14:04:58 www23 kernel: [990735.330712]  [<c023de5f>] evtchn_do_upcall+0xfa/0x191
Jul  9 14:04:58 www23 kernel: [990735.330725]  [<c010e4ee>] do_page_fault+0x0/0xb2e
Jul  9 14:04:58 www23 kernel: [990735.330736]  [<c02cc9c5>] error_code+0x35/0x3c
Jul  9 14:04:58 www23 kernel: [990735.330752]  [<c02c0000>] _cpu_down+0x50/0x20a
Jul  9 14:04:58 www23 kernel: [990735.330762]  =======================
Jul  9 14:04:58 www23 kernel: [990735.330771] Mem-info:
Jul  9 14:04:58 www23 kernel: [990735.330776] DMA per-cpu:
Jul  9 14:04:58 www23 kernel: [990735.330781] CPU    0: hi:    0, btch:   1 usd:   0
Jul  9 14:04:58 www23 kernel: [990735.330788] CPU    1: hi:    0, btch:   1 usd:   0
Jul  9 14:04:58 www23 kernel: [990735.330798] CPU    2: hi:    0, btch:   1 usd:   0
Jul  9 14:04:58 www23 kernel: [990735.330805] CPU    3: hi:    0, btch:   1 usd:   0
Jul  9 14:04:58 www23 kernel: [990735.330811] CPU    4: hi:    0, btch:   1 usd:   0
Jul  9 14:04:58 www23 kernel: [990735.330822] CPU    5: hi:    0, btch:   1 usd:   0
Jul  9 14:04:58 www23 kernel: [990735.330827] CPU    6: hi:    0, btch:   1 usd:   0
Jul  9 14:04:58 www23 kernel: [990735.330833] CPU    7: hi:    0, btch:   1 usd:   0
Jul  9 14:04:58 www23 kernel: [990735.330840] Normal per-cpu:
Jul  9 14:04:58 www23 kernel: [990735.330849] CPU    0: hi:  186, btch:  31 usd: 168
Jul  9 14:04:58 www23 kernel: [990735.330857] CPU    1: hi:  186, btch:  31 usd: 157
Jul  9 14:04:58 www23 kernel: [990735.330865] CPU    2: hi:  186, btch:  31 usd: 170
Jul  9 14:04:58 www23 kernel: [990735.330876] CPU    3: hi:  186, btch:  31 usd: 180
Jul  9 14:04:58 www23 kernel: [990735.330882] CPU    4: hi:  186, btch:  31 usd: 181
Jul  9 14:04:58 www23 kernel: [990735.330887] CPU    5: hi:  186, btch:  31 usd: 177
Jul  9 14:04:58 www23 kernel: [990735.330897] CPU    6: hi:  186, btch:  31 usd: 182
Jul  9 14:04:58 www23 kernel: [990735.330905] CPU    7: hi:  186, btch:  31 usd: 163
Jul  9 14:04:58 www23 kernel: [990735.330912] HighMem per-cpu:
Jul  9 14:04:58 www23 kernel: [990735.330917] CPU    0: hi:  186, btch:  31 usd:  18
Jul  9 14:04:58 www23 kernel: [990735.330925] CPU    1: hi:  186, btch:  31 usd: 159
Jul  9 14:04:58 www23 kernel: [990735.330930] CPU    2: hi:  186, btch:  31 usd: 110
Jul  9 14:04:58 www23 kernel: [990735.330938] CPU    3: hi:  186, btch:  31 usd:  92
Jul  9 14:04:58 www23 kernel: [990735.330945] CPU    4: hi:  186, btch:  31 usd:  66
Jul  9 14:04:58 www23 kernel: [990735.330951] CPU    5: hi:  186, btch:  31 usd: 137
Jul  9 14:04:58 www23 kernel: [990735.330959] CPU    6: hi:  186, btch:  31 usd: 165
Jul  9 14:04:58 www23 kernel: [990735.330970] CPU    7: hi:  186, btch:  31 usd: 134
Jul  9 14:04:58 www23 kernel: [990735.330979] Active:1895504 inactive:10287 dirty:0 writeback:0 unstable:0
Jul  9 14:04:58 www23 kernel: [990735.330981]  free:61788 slab:67983 mapped:13 pagetables:0 bounce:0
Jul  9 14:04:58 www23 kernel: [990735.330998] DMA free:6152kB min:72kB low:88kB high:108kB active:0kB inactive:0kB present:16256kB pages_scanned:0 all_unreclaimable? yes
Jul  9 14:04:58 www23 kernel: [990735.331008] lowmem_reserve[]: 0 706 8120 8120
Jul  9 14:04:58 www23 kernel: [990735.331028] Normal free:240504kB min:3364kB low:4204kB high:5044kB active:28964kB inactive:476kB present:723392kB pages_scanned:57291 all_unreclaimable? yes
Jul  9 14:04:58 www23 kernel: [990735.331044] lowmem_reserve[]: 0 0 59309 59309
Jul  9 14:04:58 www23 kernel: [990735.331060] HighMem free:496kB min:512kB low:9336kB high:18164kB active:7553052kB inactive:40672kB present:7591552kB pages_scanned:13757719 all_unreclaimable? yes
Jul  9 14:04:58 www23 kernel: [990735.331076] lowmem_reserve[]: 0 0 0 0
Jul  9 14:04:58 www23 kernel: [990735.331094] DMA: 126*4kB 44*8kB 47*16kB 42*32kB 8*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 6152kB
Jul  9 14:04:58 www23 kernel: [990735.331133] Normal: 41282*4kB 8957*8kB 62*16kB 1*32kB 2*64kB 2*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 240752kB
Jul  9 14:04:58 www23 kernel: [990735.331169] HighMem: 90*4kB 3*8kB 7*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 496kB
Jul  9 14:04:58 www23 kernel: [990735.331204] 18015 total pagecache pages
Jul  9 14:04:58 www23 kernel: [990735.331211] Swap cache: add 0, delete 0, find 0/0
Jul  9 14:04:58 www23 kernel: [990735.331221] Free swap  = 0kB
Jul  9 14:04:58 www23 kernel: [990735.331225] Total swap = 0kB
Jul  9 14:04:58 www23 kernel: [990735.358501] 2099200 pages of RAM
Jul  9 14:04:58 www23 kernel: [990735.358514] 1912832 pages of HIGHMEM
Jul  9 14:04:58 www23 kernel: [990735.358519] 19870 reserved pages
Jul  9 14:04:58 www23 kernel: [990735.358523] 6341777 pages shared
Jul  9 14:04:58 www23 kernel: [990735.358527] 0 pages swap cached
Jul  9 14:04:58 www23 kernel: [990735.358531] 0 pages dirty
Jul  9 14:04:58 www23 kernel: [990735.358535] 0 pages writeback
Jul  9 14:04:58 www23 kernel: [990735.358539] 13 pages mapped
Jul  9 14:04:58 www23 kernel: [990735.358542] 67983 pages slab
Jul  9 14:04:58 www23 kernel: [990735.358553] 0 pages pagetables
Jul  9 14:04:58 www23 kernel: [990735.358558] Out of memory: kill process 1493 (httpd) score 42880 or a child
Jul  9 14:04:58 www23 kernel: [990735.358629] Killed process 8099 (httpd)

As you can see the machine is a webserver, under normal conditions it's meminfo shows something like this:

MemTotal:      8388804 kB
MemFree:        666696 kB
Buffers:         49352 kB
Cached:        5589884 kB
SwapCached:      14368 kB
Active:        2106596 kB
Inactive:      5127116 kB
HighTotal:     7646976 kB
HighFree:       455376 kB
LowTotal:       741828 kB
LowFree:        211320 kB
SwapTotal:     1048568 kB
SwapFree:      1024244 kB
Dirty:             496 kB
Writeback:           0 kB
AnonPages:     1580456 kB
Mapped:          13340 kB
Slab:           367520 kB
SReclaimable:   343680 kB
SUnreclaim:      23840 kB
PageTables:          0 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
WritebackTmp:        0 kB
CommitLimit:   9437372 kB
Committed_AS:  9499972 kB
VmallocTotal:   114680 kB
VmallocUsed:      5180 kB
VmallocChunk:   106596 kB

System is Debian, Kernel is 2.6.26-2-xen-686. Any ideas are welcome.

Best Answer

I recently ran into this problem when trying to generate multiple PDFs with our web tool. For us the answer was to give PHP more memory to use by changing this setting in the /etc/php.ini file:

memory_limit = 128M

You may need to step back and think about what your PHP requests/jobs are doing to evaluate how high the memory limit should be set to.

For tracking it down, I would look at the access/error logs for apache and see what requests were being made before Jul 9 14:01:36. Any errors, warnings, or long server requests might indicate which part of your web app may be using up PHPs allowed memory limit.