I have a new server that has been frequently raising Out of memory
errors.
This has resulted in a several processes being killed by oom-killer
, and I can't work out why.
It's on AWS (EC2), with 4GB RAM (t2.medium), running a simple Apache web server with MySQL and PHP, on Ubuntu 16.04.2 LTS.
I've added a 4GB SWAP partition, using an EBS Volume (GP2).
The amount of available memory seems fine:
free -h
total used free shared buff/cache available
Mem: 3.9G 201M 49M 13M 3.6G 3.6G
Swap: 4.0G 79M 3.9G
This does not seem to have changed much throughout the day (highest used
amount from randomly checking was 250M).
So far I have not changed the vm.overcommit_
settings, as I'd like to know what the problem is before doing this:
vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50
The output from dmesg -T
is:
[Mon Feb 13 13:09:00 2017] sessionclean invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
[Mon Feb 13 13:09:00 2017] sessionclean cpuset=/ mems_allowed=0
[Mon Feb 13 13:09:00 2017] CPU: 1 PID: 22353 Comm: sessionclean Not tainted 4.4.0-59-generic #80-Ubuntu
[Mon Feb 13 13:09:00 2017] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/09/2016
[Mon Feb 13 13:09:00 2017] 0000000000000286 0000000031d57236 ffff88005ce33af0 ffffffff813f7583
[Mon Feb 13 13:09:00 2017] ffff88005ce33cc8 ffff880107ac0000 ffff88005ce33b60 ffffffff8120ad5e
[Mon Feb 13 13:09:00 2017] ffffffff81cd2dc7 0000000000000000 ffffffff81e67760 0000000000000206
[Mon Feb 13 13:09:00 2017] Call Trace:
[Mon Feb 13 13:09:00 2017] [<ffffffff813f7583>] dump_stack+0x63/0x90
[Mon Feb 13 13:09:00 2017] [<ffffffff8120ad5e>] dump_header+0x5a/0x1c5
[Mon Feb 13 13:09:00 2017] [<ffffffff81192722>] oom_kill_process+0x202/0x3c0
[Mon Feb 13 13:09:00 2017] [<ffffffff811920ee>] ? oom_unkillable_task+0x9e/0xd0
[Mon Feb 13 13:09:00 2017] [<ffffffff81192b49>] out_of_memory+0x219/0x460
[Mon Feb 13 13:09:00 2017] [<ffffffff81198abd>] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
[Mon Feb 13 13:09:00 2017] [<ffffffff81198eb6>] __alloc_pages_nodemask+0x286/0x2a0
[Mon Feb 13 13:09:00 2017] [<ffffffff81198f6b>] alloc_kmem_pages_node+0x4b/0xc0
[Mon Feb 13 13:09:00 2017] [<ffffffff8107ea5e>] copy_process+0x1be/0x1b70
[Mon Feb 13 13:09:00 2017] [<ffffffff811c1670>] ? handle_mm_fault+0xce0/0x1820
[Mon Feb 13 13:09:00 2017] [<ffffffff810805a0>] _do_fork+0x80/0x360
[Mon Feb 13 13:09:00 2017] [<ffffffff81080929>] SyS_clone+0x19/0x20
[Mon Feb 13 13:09:00 2017] [<ffffffff818384f2>] entry_SYSCALL_64_fastpath+0x16/0x71
[Mon Feb 13 13:09:00 2017] Mem-Info:
[Mon Feb 13 13:09:00 2017] active_anon:17107 inactive_anon:20026 isolated_anon:0
active_file:424384 inactive_file:433967 isolated_file:0
unevictable:914 dirty:36 writeback:0 unstable:0
slab_reclaimable:71336 slab_unreclaimable:6907
mapped:12390 shmem:3580 pagetables:4224 bounce:0
free:25546 free_pcp:0 free_cma:0
[Mon Feb 13 13:09:00 2017] Node 0 DMA free:15872kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:8kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:0kB slab_reclaimable:20kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[Mon Feb 13 13:09:00 2017] lowmem_reserve[]: 0 3745 3934 3934 3934
[Mon Feb 13 13:09:00 2017] Node 0 DMA32 free:66512kB min:7528kB low:9408kB high:11292kB active_anon:63604kB inactive_anon:72976kB active_file:1691880kB inactive_file:1723756kB unevictable:3244kB isolated(anon):0kB isolated(file):0kB present:3915776kB managed:3835152kB mlocked:3244kB dirty:124kB writeback:0kB mapped:48740kB shmem:13476kB slab_reclaimable:156484kB slab_unreclaimable:23920kB kernel_stack:2784kB pagetables:15124kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[Mon Feb 13 13:09:00 2017] lowmem_reserve[]: 0 0 189 189 189
[Mon Feb 13 13:09:00 2017] Node 0 Normal free:19800kB min:380kB low:472kB high:568kB active_anon:4824kB inactive_anon:7128kB active_file:5648kB inactive_file:12108kB unevictable:412kB isolated(anon):0kB isolated(file):0kB present:393216kB managed:193908kB mlocked:412kB dirty:20kB writeback:0kB mapped:812kB shmem:844kB slab_reclaimable:128840kB slab_unreclaimable:3708kB kernel_stack:816kB pagetables:1772kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[Mon Feb 13 13:09:00 2017] lowmem_reserve[]: 0 0 0 0 0
[Mon Feb 13 13:09:00 2017] Node 0 DMA: 4*4kB (MEH) 4*8kB (MEH) 1*16kB (H) 2*32kB (ME) 4*64kB (UME) 3*128kB (UME) 3*256kB (UME) 2*512kB (ME) 3*1024kB (MEH) 1*2048kB (E) 2*4096kB (M) = 15872kB
[Mon Feb 13 13:09:00 2017] Node 0 DMA32: 16314*4kB (UME) 172*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 66632kB
[Mon Feb 13 13:09:00 2017] Node 0 Normal: 4161*4kB (UE) 147*8kB (UME) 0*16kB 0*32kB 1*64kB (H) 1*128kB (H) 1*256kB (H) 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 19804kB
[Mon Feb 13 13:09:00 2017] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Mon Feb 13 13:09:00 2017] 864746 total pagecache pages
[Mon Feb 13 13:09:00 2017] 2212 pages in swap cache
[Mon Feb 13 13:09:00 2017] Swap cache stats: add 62420, delete 60208, find 121128/130192
[Mon Feb 13 13:09:00 2017] Free swap = 4103680kB
[Mon Feb 13 13:09:00 2017] Total swap = 4194300kB
[Mon Feb 13 13:09:00 2017] 1081245 pages RAM
[Mon Feb 13 13:09:00 2017] 0 pages HighMem/MovableOnly
[Mon Feb 13 13:09:00 2017] 70004 pages reserved
[Mon Feb 13 13:09:00 2017] 0 pages cma reserved
[Mon Feb 13 13:09:00 2017] 0 pages hwpoisoned
[Mon Feb 13 13:09:00 2017] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[Mon Feb 13 13:09:00 2017] [ 379] 0 379 10183 1130 22 3 771 0 systemd-journal
[Mon Feb 13 13:09:00 2017] [ 435] 0 435 25742 48 18 3 0 0 lvmetad
[Mon Feb 13 13:09:00 2017] [ 478] 0 478 10704 531 23 3 247 -1000 systemd-udevd
[Mon Feb 13 13:09:00 2017] [ 906] 0 906 4030 662 11 3 27 0 dhclient
[Mon Feb 13 13:09:00 2017] [ 1032] 0 1032 1306 30 8 3 0 0 iscsid
[Mon Feb 13 13:09:00 2017] [ 1033] 0 1033 1431 877 8 3 0 -17 iscsid
[Mon Feb 13 13:09:00 2017] [ 1035] 0 1035 1100 308 8 3 0 0 acpid
[Mon Feb 13 13:09:00 2017] [ 1037] 104 1037 64099 599 27 3 196 0 rsyslogd
[Mon Feb 13 13:09:00 2017] [ 1043] 0 1043 58710 315 17 4 41 0 lxcfs
[Mon Feb 13 13:09:00 2017] [ 1047] 107 1047 10724 334 24 3 25 -900 dbus-daemon
[Mon Feb 13 13:09:00 2017] [ 1055] 0 1055 6511 409 19 3 25 0 atd
[Mon Feb 13 13:09:00 2017] [ 1065] 0 1065 71269 64 41 3 2782 0 accounts-daemon
[Mon Feb 13 13:09:00 2017] [ 1069] 0 1069 7155 502 18 3 27 0 systemd-logind
[Mon Feb 13 13:09:00 2017] [ 1071] 0 1071 16380 545 36 3 153 -1000 sshd
[Mon Feb 13 13:09:00 2017] [ 1073] 0 1073 6932 502 19 3 38 0 cron
[Mon Feb 13 13:09:00 2017] [ 1101] 0 1101 3344 51 11 3 1 0 mdadm
[Mon Feb 13 13:09:00 2017] [ 1114] 0 1114 69272 148 39 3 45 0 polkitd
[Mon Feb 13 13:09:00 2017] [ 1202] 0 1202 4868 428 14 3 37 0 irqbalance
[Mon Feb 13 13:09:00 2017] [ 1221] 113 1221 27509 437 24 3 97 0 ntpd
[Mon Feb 13 13:09:00 2017] [ 1236] 0 1236 3619 402 11 3 0 0 agetty
[Mon Feb 13 13:09:00 2017] [ 1238] 0 1238 3665 361 12 3 0 0 agetty
[Mon Feb 13 13:09:00 2017] [ 1389] 0 1389 100324 4842 154 3 1167 0 apache2
[Mon Feb 13 13:09:00 2017] [ 7047] 114 7047 88165 925 64 3 437 0 opendkim
[Mon Feb 13 13:09:00 2017] [30354] 0 30354 16352 526 23 3 82 0 master
[Mon Feb 13 13:09:00 2017] [30356] 112 30356 16999 306 25 3 169 0 qmgr
[Mon Feb 13 13:09:00 2017] [30361] 112 30361 20306 889 30 3 174 0 tlsmgr
[Mon Feb 13 13:09:00 2017] [27212] 0 27212 6011 664 17 3 23 0 vsftpd
[Mon Feb 13 13:09:00 2017] [30271] 0 30271 4981 711 14 3 73 0 mysqld_safe
[Mon Feb 13 13:09:00 2017] [20955] 0 20955 23842 1213 51 3 229 0 sshd
[Mon Feb 13 13:09:00 2017] [20957] 1001 20957 11312 608 26 3 218 0 systemd
[Mon Feb 13 13:09:00 2017] [20959] 1001 20959 52158 95 36 4 391 0 (sd-pam)
[Mon Feb 13 13:09:00 2017] [21053] 1001 21053 23908 629 49 3 309 0 sshd
[Mon Feb 13 13:09:00 2017] [21054] 1001 21054 5382 862 15 3 434 0 bash
[Mon Feb 13 13:09:00 2017] [21565] 0 21565 23842 1305 51 3 136 0 sshd
[Mon Feb 13 13:09:00 2017] [21595] 1001 21595 23842 728 48 3 143 0 sshd
[Mon Feb 13 13:09:00 2017] [21596] 1001 21596 5378 820 15 3 452 0 bash
[Mon Feb 13 13:09:00 2017] [21844] 0 21844 23842 1424 50 3 20 0 sshd
[Mon Feb 13 13:09:00 2017] [21874] 1001 21874 23924 840 50 3 137 0 sshd
[Mon Feb 13 13:09:00 2017] [21875] 1001 21875 5378 724 16 3 524 0 bash
[Mon Feb 13 13:09:00 2017] [21891] 0 21891 13972 727 32 3 149 0 sudo
[Mon Feb 13 13:09:00 2017] [21892] 0 21892 12751 612 30 3 105 0 su
[Mon Feb 13 13:09:00 2017] [21893] 0 21893 5030 861 15 3 74 0 bash
[Mon Feb 13 13:09:00 2017] [22092] 116 22092 154426 22725 107 4 608 0 mysqld
[Mon Feb 13 13:09:00 2017] [22093] 0 22093 6203 277 18 3 10 0 logger
[Mon Feb 13 13:09:00 2017] [22166] 112 22166 16869 1066 25 3 0 0 pickup
[Mon Feb 13 13:09:00 2017] [22256] 33 22256 100903 4062 155 3 1028 0 apache2
[Mon Feb 13 13:09:00 2017] [22272] 33 22272 101113 6090 157 3 919 0 apache2
[Mon Feb 13 13:09:00 2017] [22273] 33 22273 101016 4352 156 3 1034 0 apache2
[Mon Feb 13 13:09:00 2017] [22285] 33 22285 100915 4301 155 3 1030 0 apache2
[Mon Feb 13 13:09:00 2017] [22290] 33 22290 100994 3732 153 3 1036 0 apache2
[Mon Feb 13 13:09:00 2017] [22291] 33 22291 101505 5021 157 3 1031 0 apache2
[Mon Feb 13 13:09:00 2017] [22292] 33 22292 100991 3457 153 3 1048 0 apache2
[Mon Feb 13 13:09:00 2017] [22313] 33 22313 100886 3437 153 3 1049 0 apache2
[Mon Feb 13 13:09:00 2017] [22314] 33 22314 100374 1309 144 3 1139 0 apache2
[Mon Feb 13 13:09:00 2017] [22315] 33 22315 100890 3193 151 3 1054 0 apache2
[Mon Feb 13 13:09:00 2017] [22316] 33 22316 101015 5788 160 3 986 0 apache2
[Mon Feb 13 13:09:00 2017] [22317] 33 22317 100380 1658 144 3 1127 0 apache2
[Mon Feb 13 13:09:00 2017] [22318] 33 22318 100901 3771 155 3 1046 0 apache2
[Mon Feb 13 13:09:00 2017] [22319] 33 22319 100374 1309 144 3 1139 0 apache2
[Mon Feb 13 13:09:00 2017] [22321] 33 22321 100900 3686 155 3 1045 0 apache2
[Mon Feb 13 13:09:00 2017] [22326] 33 22326 101024 4417 155 3 1039 0 apache2
[Mon Feb 13 13:09:00 2017] [22327] 33 22327 100378 1309 144 3 1139 0 apache2
[Mon Feb 13 13:09:00 2017] [22328] 33 22328 101533 6627 161 3 853 0 apache2
[Mon Feb 13 13:09:00 2017] [22329] 0 22329 12235 727 29 3 11 0 cron
[Mon Feb 13 13:09:00 2017] [22330] 0 22330 1127 195 8 3 0 0 sh
[Mon Feb 13 13:09:00 2017] [22331] 0 22331 1127 202 8 3 0 0 sessionclean
[Mon Feb 13 13:09:00 2017] [22332] 0 22332 1127 26 8 3 0 0 sessionclean
[Mon Feb 13 13:09:00 2017] [22334] 0 22334 4144 185 9 3 0 0 sort
[Mon Feb 13 13:09:00 2017] [22335] 0 22335 4144 186 10 3 0 0 sort
[Mon Feb 13 13:09:00 2017] [22336] 0 22336 1127 26 8 3 0 0 sessionclean
[Mon Feb 13 13:09:00 2017] [22342] 0 22342 1127 301 8 3 0 0 sessionclean
[Mon Feb 13 13:09:00 2017] [22353] 0 22353 1127 29 8 3 0 0 sessionclean
[Mon Feb 13 13:09:00 2017] Out of memory: Kill process 22092 (mysqld) score 11 or sacrifice child
[Mon Feb 13 13:09:00 2017] Killed process 22092 (mysqld) total-vm:617704kB, anon-rss:76544kB, file-rss:14356kB
And the content of /proc/meminfo
:
MemTotal: 4044964 kB
MemFree: 69100 kB
MemAvailable: 3665524 kB
Buffers: 1509532 kB
Cached: 1959440 kB
SwapCached: 6288 kB
Active: 2181068 kB
Inactive: 1524292 kB
Active(anon): 133556 kB
Inactive(anon): 119404 kB
Active(file): 2047512 kB
Inactive(file): 1404888 kB
Unevictable: 3656 kB
Mlocked: 3656 kB
SwapTotal: 4194300 kB
SwapFree: 4112600 kB
Dirty: 184 kB
Writeback: 0 kB
AnonPages: 235104 kB
Mapped: 59700 kB
Shmem: 14192 kB
Slab: 220372 kB
SReclaimable: 192240 kB
SUnreclaim: 28132 kB
KernelStack: 3936 kB
PageTables: 17116 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 6216780 kB
Committed_AS: 1189376 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 14336 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 137216 kB
DirectMap2M: 4188160 kB
I'm not too familiar with oom-killer
, so I'm wondering if this is due to total_vm
, where that column adds up to 2937127 – which, if multiplied by 4 kB pages, gives 11.2 GB.
Where I have 4 GB of physical RAM, and 4GB of SWAP space (total of 8), which if multiplied by 1.5 (due to vm.overcommit_ratio 50), would give 12GB.
For comparison, the old server never had any memory problems, and it was running a few more services as well.
It also has 4GB RAM and 4GB of SWAP, and exactly the same vm.overcommit_
settings… the only real difference is that it's a physical server, and running CentOS (6.8)
Am I missing something? because this server looks like it should have plenty of RAM to spare, and hasn't touched the SWAP space.
Best Answer
Just to provide the answer, in this case it was due to Ubuntu bug #1655842.