We have some VMs running kernel 2.26.26-2 under xen 4.0.3 with different CPU and RAM amounts and on some seeing almost random OOM events each rendering the VM unusable. Output is as follows, I have the following questions:
- What has
_cpu_down
to do here? I see this in common on all affected machines but none of them should call this in my opinion (belongs to the cpu-hotplug system I think) - Does anyone see something spooky in the memory printout? The available values for HighMem and Normal are almost the same on every machine with this problem.
- Am I right if I read the line
HighMem free:496kB min:512kB low:9336kB
as the source of the OOM since itsfree
value dropped belowmin
? - Any suggestions on how to track down the problem? I monitor the memory usage on all machines but none shows a drop in free memory around the time of the OOM (monitor intervall is 5 min so I might be missing the exciting parts).
Jul 9 14:01:36 www23 kernel: [990735.330526] php-cgi invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0 Jul 9 14:01:42 www23 kernel: [990735.330544] Pid: 7749, comm: php-cgi Not tainted 2.6.26-2-xen-686 #1 Jul 9 14:04:20 www23 kernel: [990735.330555] [<c0150df2>] oom_kill_process+0x4f/0x1bb Jul 9 14:04:20 www23 kernel: [990735.330571] [<c0125134>] __capable+0x8/0x1b Jul 9 14:04:58 www23 kernel: [990735.330582] [<c0151293>] out_of_memory+0x14e/0x17f Jul 9 14:04:58 www23 kernel: [990735.330591] [<c01532e1>] __alloc_pages_internal+0x2b8/0x34e Jul 9 14:04:58 www23 kernel: [990735.330603] [<c0153383>] __alloc_pages+0x7/0x9 Jul 9 14:04:58 www23 kernel: [990735.330617] [<c0154e7c>] __do_page_cache_readahead+0x86/0x163 Jul 9 14:04:58 www23 kernel: [990735.330631] [<c01552ae>] do_page_cache_readahead+0x3d/0x47 Jul 9 14:04:58 www23 kernel: [990735.330646] [<c0150692>] filemap_fault+0x164/0x35a Jul 9 14:04:58 www23 kernel: [990735.330658] [<c015a5a5>] __do_fault+0x43/0x4c9 Jul 9 14:04:58 www23 kernel: [990735.330677] [<c015ec3b>] handle_mm_fault+0x63c/0xda5 Jul 9 14:04:58 www23 kernel: [990735.330688] [<c01e0aaf>] rb_erase+0x176/0x22f Jul 9 14:04:58 www23 kernel: [990735.330702] [<c010eaf3>] do_page_fault+0x605/0xb2e Jul 9 14:04:58 www23 kernel: [990735.330712] [<c023de5f>] evtchn_do_upcall+0xfa/0x191 Jul 9 14:04:58 www23 kernel: [990735.330725] [<c010e4ee>] do_page_fault+0x0/0xb2e Jul 9 14:04:58 www23 kernel: [990735.330736] [<c02cc9c5>] error_code+0x35/0x3c Jul 9 14:04:58 www23 kernel: [990735.330752] [<c02c0000>] _cpu_down+0x50/0x20a Jul 9 14:04:58 www23 kernel: [990735.330762] ======================= Jul 9 14:04:58 www23 kernel: [990735.330771] Mem-info: Jul 9 14:04:58 www23 kernel: [990735.330776] DMA per-cpu: Jul 9 14:04:58 www23 kernel: [990735.330781] CPU 0: hi: 0, btch: 1 usd: 0 Jul 9 14:04:58 www23 kernel: [990735.330788] CPU 1: hi: 0, btch: 1 usd: 0 Jul 9 14:04:58 www23 kernel: [990735.330798] CPU 2: hi: 0, btch: 1 usd: 0 Jul 9 14:04:58 www23 kernel: [990735.330805] CPU 3: hi: 0, btch: 1 usd: 0 Jul 9 14:04:58 www23 kernel: [990735.330811] CPU 4: hi: 0, btch: 1 usd: 0 Jul 9 14:04:58 www23 kernel: [990735.330822] CPU 5: hi: 0, btch: 1 usd: 0 Jul 9 14:04:58 www23 kernel: [990735.330827] CPU 6: hi: 0, btch: 1 usd: 0 Jul 9 14:04:58 www23 kernel: [990735.330833] CPU 7: hi: 0, btch: 1 usd: 0 Jul 9 14:04:58 www23 kernel: [990735.330840] Normal per-cpu: Jul 9 14:04:58 www23 kernel: [990735.330849] CPU 0: hi: 186, btch: 31 usd: 168 Jul 9 14:04:58 www23 kernel: [990735.330857] CPU 1: hi: 186, btch: 31 usd: 157 Jul 9 14:04:58 www23 kernel: [990735.330865] CPU 2: hi: 186, btch: 31 usd: 170 Jul 9 14:04:58 www23 kernel: [990735.330876] CPU 3: hi: 186, btch: 31 usd: 180 Jul 9 14:04:58 www23 kernel: [990735.330882] CPU 4: hi: 186, btch: 31 usd: 181 Jul 9 14:04:58 www23 kernel: [990735.330887] CPU 5: hi: 186, btch: 31 usd: 177 Jul 9 14:04:58 www23 kernel: [990735.330897] CPU 6: hi: 186, btch: 31 usd: 182 Jul 9 14:04:58 www23 kernel: [990735.330905] CPU 7: hi: 186, btch: 31 usd: 163 Jul 9 14:04:58 www23 kernel: [990735.330912] HighMem per-cpu: Jul 9 14:04:58 www23 kernel: [990735.330917] CPU 0: hi: 186, btch: 31 usd: 18 Jul 9 14:04:58 www23 kernel: [990735.330925] CPU 1: hi: 186, btch: 31 usd: 159 Jul 9 14:04:58 www23 kernel: [990735.330930] CPU 2: hi: 186, btch: 31 usd: 110 Jul 9 14:04:58 www23 kernel: [990735.330938] CPU 3: hi: 186, btch: 31 usd: 92 Jul 9 14:04:58 www23 kernel: [990735.330945] CPU 4: hi: 186, btch: 31 usd: 66 Jul 9 14:04:58 www23 kernel: [990735.330951] CPU 5: hi: 186, btch: 31 usd: 137 Jul 9 14:04:58 www23 kernel: [990735.330959] CPU 6: hi: 186, btch: 31 usd: 165 Jul 9 14:04:58 www23 kernel: [990735.330970] CPU 7: hi: 186, btch: 31 usd: 134 Jul 9 14:04:58 www23 kernel: [990735.330979] Active:1895504 inactive:10287 dirty:0 writeback:0 unstable:0 Jul 9 14:04:58 www23 kernel: [990735.330981] free:61788 slab:67983 mapped:13 pagetables:0 bounce:0 Jul 9 14:04:58 www23 kernel: [990735.330998] DMA free:6152kB min:72kB low:88kB high:108kB active:0kB inactive:0kB present:16256kB pages_scanned:0 all_unreclaimable? yes Jul 9 14:04:58 www23 kernel: [990735.331008] lowmem_reserve[]: 0 706 8120 8120 Jul 9 14:04:58 www23 kernel: [990735.331028] Normal free:240504kB min:3364kB low:4204kB high:5044kB active:28964kB inactive:476kB present:723392kB pages_scanned:57291 all_unreclaimable? yes Jul 9 14:04:58 www23 kernel: [990735.331044] lowmem_reserve[]: 0 0 59309 59309 Jul 9 14:04:58 www23 kernel: [990735.331060] HighMem free:496kB min:512kB low:9336kB high:18164kB active:7553052kB inactive:40672kB present:7591552kB pages_scanned:13757719 all_unreclaimable? yes Jul 9 14:04:58 www23 kernel: [990735.331076] lowmem_reserve[]: 0 0 0 0 Jul 9 14:04:58 www23 kernel: [990735.331094] DMA: 126*4kB 44*8kB 47*16kB 42*32kB 8*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 6152kB Jul 9 14:04:58 www23 kernel: [990735.331133] Normal: 41282*4kB 8957*8kB 62*16kB 1*32kB 2*64kB 2*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 240752kB Jul 9 14:04:58 www23 kernel: [990735.331169] HighMem: 90*4kB 3*8kB 7*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 496kB Jul 9 14:04:58 www23 kernel: [990735.331204] 18015 total pagecache pages Jul 9 14:04:58 www23 kernel: [990735.331211] Swap cache: add 0, delete 0, find 0/0 Jul 9 14:04:58 www23 kernel: [990735.331221] Free swap = 0kB Jul 9 14:04:58 www23 kernel: [990735.331225] Total swap = 0kB Jul 9 14:04:58 www23 kernel: [990735.358501] 2099200 pages of RAM Jul 9 14:04:58 www23 kernel: [990735.358514] 1912832 pages of HIGHMEM Jul 9 14:04:58 www23 kernel: [990735.358519] 19870 reserved pages Jul 9 14:04:58 www23 kernel: [990735.358523] 6341777 pages shared Jul 9 14:04:58 www23 kernel: [990735.358527] 0 pages swap cached Jul 9 14:04:58 www23 kernel: [990735.358531] 0 pages dirty Jul 9 14:04:58 www23 kernel: [990735.358535] 0 pages writeback Jul 9 14:04:58 www23 kernel: [990735.358539] 13 pages mapped Jul 9 14:04:58 www23 kernel: [990735.358542] 67983 pages slab Jul 9 14:04:58 www23 kernel: [990735.358553] 0 pages pagetables Jul 9 14:04:58 www23 kernel: [990735.358558] Out of memory: kill process 1493 (httpd) score 42880 or a child Jul 9 14:04:58 www23 kernel: [990735.358629] Killed process 8099 (httpd)
As you can see the machine is a webserver, under normal conditions it's meminfo shows something like this:
MemTotal: 8388804 kB
MemFree: 666696 kB
Buffers: 49352 kB
Cached: 5589884 kB
SwapCached: 14368 kB
Active: 2106596 kB
Inactive: 5127116 kB
HighTotal: 7646976 kB
HighFree: 455376 kB
LowTotal: 741828 kB
LowFree: 211320 kB
SwapTotal: 1048568 kB
SwapFree: 1024244 kB
Dirty: 496 kB
Writeback: 0 kB
AnonPages: 1580456 kB
Mapped: 13340 kB
Slab: 367520 kB
SReclaimable: 343680 kB
SUnreclaim: 23840 kB
PageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 9437372 kB
Committed_AS: 9499972 kB
VmallocTotal: 114680 kB
VmallocUsed: 5180 kB
VmallocChunk: 106596 kB
System is Debian, Kernel is 2.6.26-2-xen-686. Any ideas are welcome.
Best Answer
I recently ran into this problem when trying to generate multiple PDFs with our web tool. For us the answer was to give PHP more memory to use by changing this setting in the /etc/php.ini file:
memory_limit = 128M
You may need to step back and think about what your PHP requests/jobs are doing to evaluate how high the memory limit should be set to.
For tracking it down, I would look at the access/error logs for apache and see what requests were being made before Jul 9 14:01:36. Any errors, warnings, or long server requests might indicate which part of your web app may be using up PHPs allowed memory limit.