Ubuntu – Out of memory debugging Ubuntu / AWS

amazon-web-servicesmemorymemory usageUbuntuubuntu-16.04

I have a new server that has been frequently raising Out of memory errors.

This has resulted in a several processes being killed by oom-killer, and I can't work out why.


It's on AWS (EC2), with 4GB RAM (t2.medium), running a simple Apache web server with MySQL and PHP, on Ubuntu 16.04.2 LTS.

I've added a 4GB SWAP partition, using an EBS Volume (GP2).

The amount of available memory seems fine:

free -h
              total        used        free      shared  buff/cache   available
Mem:           3.9G        201M         49M         13M        3.6G        3.6G
Swap:          4.0G         79M        3.9G

This does not seem to have changed much throughout the day (highest used amount from randomly checking was 250M).

So far I have not changed the vm.overcommit_ settings, as I'd like to know what the problem is before doing this:

vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50

The output from dmesg -T is:

[Mon Feb 13 13:09:00 2017] sessionclean invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
[Mon Feb 13 13:09:00 2017] sessionclean cpuset=/ mems_allowed=0
[Mon Feb 13 13:09:00 2017] CPU: 1 PID: 22353 Comm: sessionclean Not tainted 4.4.0-59-generic #80-Ubuntu
[Mon Feb 13 13:09:00 2017] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/09/2016
[Mon Feb 13 13:09:00 2017]  0000000000000286 0000000031d57236 ffff88005ce33af0 ffffffff813f7583
[Mon Feb 13 13:09:00 2017]  ffff88005ce33cc8 ffff880107ac0000 ffff88005ce33b60 ffffffff8120ad5e
[Mon Feb 13 13:09:00 2017]  ffffffff81cd2dc7 0000000000000000 ffffffff81e67760 0000000000000206
[Mon Feb 13 13:09:00 2017] Call Trace:
[Mon Feb 13 13:09:00 2017]  [<ffffffff813f7583>] dump_stack+0x63/0x90
[Mon Feb 13 13:09:00 2017]  [<ffffffff8120ad5e>] dump_header+0x5a/0x1c5
[Mon Feb 13 13:09:00 2017]  [<ffffffff81192722>] oom_kill_process+0x202/0x3c0
[Mon Feb 13 13:09:00 2017]  [<ffffffff811920ee>] ? oom_unkillable_task+0x9e/0xd0
[Mon Feb 13 13:09:00 2017]  [<ffffffff81192b49>] out_of_memory+0x219/0x460
[Mon Feb 13 13:09:00 2017]  [<ffffffff81198abd>] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
[Mon Feb 13 13:09:00 2017]  [<ffffffff81198eb6>] __alloc_pages_nodemask+0x286/0x2a0
[Mon Feb 13 13:09:00 2017]  [<ffffffff81198f6b>] alloc_kmem_pages_node+0x4b/0xc0
[Mon Feb 13 13:09:00 2017]  [<ffffffff8107ea5e>] copy_process+0x1be/0x1b70
[Mon Feb 13 13:09:00 2017]  [<ffffffff811c1670>] ? handle_mm_fault+0xce0/0x1820
[Mon Feb 13 13:09:00 2017]  [<ffffffff810805a0>] _do_fork+0x80/0x360
[Mon Feb 13 13:09:00 2017]  [<ffffffff81080929>] SyS_clone+0x19/0x20
[Mon Feb 13 13:09:00 2017]  [<ffffffff818384f2>] entry_SYSCALL_64_fastpath+0x16/0x71
[Mon Feb 13 13:09:00 2017] Mem-Info:
[Mon Feb 13 13:09:00 2017] active_anon:17107 inactive_anon:20026 isolated_anon:0
                            active_file:424384 inactive_file:433967 isolated_file:0
                            unevictable:914 dirty:36 writeback:0 unstable:0
                            slab_reclaimable:71336 slab_unreclaimable:6907
                            mapped:12390 shmem:3580 pagetables:4224 bounce:0
                            free:25546 free_pcp:0 free_cma:0
[Mon Feb 13 13:09:00 2017] Node 0 DMA free:15872kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:8kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:0kB slab_reclaimable:20kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[Mon Feb 13 13:09:00 2017] lowmem_reserve[]: 0 3745 3934 3934 3934
[Mon Feb 13 13:09:00 2017] Node 0 DMA32 free:66512kB min:7528kB low:9408kB high:11292kB active_anon:63604kB inactive_anon:72976kB active_file:1691880kB inactive_file:1723756kB unevictable:3244kB isolated(anon):0kB isolated(file):0kB present:3915776kB managed:3835152kB mlocked:3244kB dirty:124kB writeback:0kB mapped:48740kB shmem:13476kB slab_reclaimable:156484kB slab_unreclaimable:23920kB kernel_stack:2784kB pagetables:15124kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[Mon Feb 13 13:09:00 2017] lowmem_reserve[]: 0 0 189 189 189
[Mon Feb 13 13:09:00 2017] Node 0 Normal free:19800kB min:380kB low:472kB high:568kB active_anon:4824kB inactive_anon:7128kB active_file:5648kB inactive_file:12108kB unevictable:412kB isolated(anon):0kB isolated(file):0kB present:393216kB managed:193908kB mlocked:412kB dirty:20kB writeback:0kB mapped:812kB shmem:844kB slab_reclaimable:128840kB slab_unreclaimable:3708kB kernel_stack:816kB pagetables:1772kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[Mon Feb 13 13:09:00 2017] lowmem_reserve[]: 0 0 0 0 0
[Mon Feb 13 13:09:00 2017] Node 0 DMA: 4*4kB (MEH) 4*8kB (MEH) 1*16kB (H) 2*32kB (ME) 4*64kB (UME) 3*128kB (UME) 3*256kB (UME) 2*512kB (ME) 3*1024kB (MEH) 1*2048kB (E) 2*4096kB (M) = 15872kB
[Mon Feb 13 13:09:00 2017] Node 0 DMA32: 16314*4kB (UME) 172*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 66632kB
[Mon Feb 13 13:09:00 2017] Node 0 Normal: 4161*4kB (UE) 147*8kB (UME) 0*16kB 0*32kB 1*64kB (H) 1*128kB (H) 1*256kB (H) 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 19804kB
[Mon Feb 13 13:09:00 2017] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Mon Feb 13 13:09:00 2017] 864746 total pagecache pages
[Mon Feb 13 13:09:00 2017] 2212 pages in swap cache
[Mon Feb 13 13:09:00 2017] Swap cache stats: add 62420, delete 60208, find 121128/130192
[Mon Feb 13 13:09:00 2017] Free swap  = 4103680kB
[Mon Feb 13 13:09:00 2017] Total swap = 4194300kB
[Mon Feb 13 13:09:00 2017] 1081245 pages RAM
[Mon Feb 13 13:09:00 2017] 0 pages HighMem/MovableOnly
[Mon Feb 13 13:09:00 2017] 70004 pages reserved
[Mon Feb 13 13:09:00 2017] 0 pages cma reserved
[Mon Feb 13 13:09:00 2017] 0 pages hwpoisoned
[Mon Feb 13 13:09:00 2017] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[Mon Feb 13 13:09:00 2017] [  379]     0   379    10183     1130      22       3      771             0 systemd-journal
[Mon Feb 13 13:09:00 2017] [  435]     0   435    25742       48      18       3        0             0 lvmetad
[Mon Feb 13 13:09:00 2017] [  478]     0   478    10704      531      23       3      247         -1000 systemd-udevd
[Mon Feb 13 13:09:00 2017] [  906]     0   906     4030      662      11       3       27             0 dhclient
[Mon Feb 13 13:09:00 2017] [ 1032]     0  1032     1306       30       8       3        0             0 iscsid
[Mon Feb 13 13:09:00 2017] [ 1033]     0  1033     1431      877       8       3        0           -17 iscsid
[Mon Feb 13 13:09:00 2017] [ 1035]     0  1035     1100      308       8       3        0             0 acpid
[Mon Feb 13 13:09:00 2017] [ 1037]   104  1037    64099      599      27       3      196             0 rsyslogd
[Mon Feb 13 13:09:00 2017] [ 1043]     0  1043    58710      315      17       4       41             0 lxcfs
[Mon Feb 13 13:09:00 2017] [ 1047]   107  1047    10724      334      24       3       25          -900 dbus-daemon
[Mon Feb 13 13:09:00 2017] [ 1055]     0  1055     6511      409      19       3       25             0 atd
[Mon Feb 13 13:09:00 2017] [ 1065]     0  1065    71269       64      41       3     2782             0 accounts-daemon
[Mon Feb 13 13:09:00 2017] [ 1069]     0  1069     7155      502      18       3       27             0 systemd-logind
[Mon Feb 13 13:09:00 2017] [ 1071]     0  1071    16380      545      36       3      153         -1000 sshd
[Mon Feb 13 13:09:00 2017] [ 1073]     0  1073     6932      502      19       3       38             0 cron
[Mon Feb 13 13:09:00 2017] [ 1101]     0  1101     3344       51      11       3        1             0 mdadm
[Mon Feb 13 13:09:00 2017] [ 1114]     0  1114    69272      148      39       3       45             0 polkitd
[Mon Feb 13 13:09:00 2017] [ 1202]     0  1202     4868      428      14       3       37             0 irqbalance
[Mon Feb 13 13:09:00 2017] [ 1221]   113  1221    27509      437      24       3       97             0 ntpd
[Mon Feb 13 13:09:00 2017] [ 1236]     0  1236     3619      402      11       3        0             0 agetty
[Mon Feb 13 13:09:00 2017] [ 1238]     0  1238     3665      361      12       3        0             0 agetty
[Mon Feb 13 13:09:00 2017] [ 1389]     0  1389   100324     4842     154       3     1167             0 apache2
[Mon Feb 13 13:09:00 2017] [ 7047]   114  7047    88165      925      64       3      437             0 opendkim
[Mon Feb 13 13:09:00 2017] [30354]     0 30354    16352      526      23       3       82             0 master
[Mon Feb 13 13:09:00 2017] [30356]   112 30356    16999      306      25       3      169             0 qmgr
[Mon Feb 13 13:09:00 2017] [30361]   112 30361    20306      889      30       3      174             0 tlsmgr
[Mon Feb 13 13:09:00 2017] [27212]     0 27212     6011      664      17       3       23             0 vsftpd
[Mon Feb 13 13:09:00 2017] [30271]     0 30271     4981      711      14       3       73             0 mysqld_safe
[Mon Feb 13 13:09:00 2017] [20955]     0 20955    23842     1213      51       3      229             0 sshd
[Mon Feb 13 13:09:00 2017] [20957]  1001 20957    11312      608      26       3      218             0 systemd
[Mon Feb 13 13:09:00 2017] [20959]  1001 20959    52158       95      36       4      391             0 (sd-pam)
[Mon Feb 13 13:09:00 2017] [21053]  1001 21053    23908      629      49       3      309             0 sshd
[Mon Feb 13 13:09:00 2017] [21054]  1001 21054     5382      862      15       3      434             0 bash
[Mon Feb 13 13:09:00 2017] [21565]     0 21565    23842     1305      51       3      136             0 sshd
[Mon Feb 13 13:09:00 2017] [21595]  1001 21595    23842      728      48       3      143             0 sshd
[Mon Feb 13 13:09:00 2017] [21596]  1001 21596     5378      820      15       3      452             0 bash
[Mon Feb 13 13:09:00 2017] [21844]     0 21844    23842     1424      50       3       20             0 sshd
[Mon Feb 13 13:09:00 2017] [21874]  1001 21874    23924      840      50       3      137             0 sshd
[Mon Feb 13 13:09:00 2017] [21875]  1001 21875     5378      724      16       3      524             0 bash
[Mon Feb 13 13:09:00 2017] [21891]     0 21891    13972      727      32       3      149             0 sudo
[Mon Feb 13 13:09:00 2017] [21892]     0 21892    12751      612      30       3      105             0 su
[Mon Feb 13 13:09:00 2017] [21893]     0 21893     5030      861      15       3       74             0 bash
[Mon Feb 13 13:09:00 2017] [22092]   116 22092   154426    22725     107       4      608             0 mysqld
[Mon Feb 13 13:09:00 2017] [22093]     0 22093     6203      277      18       3       10             0 logger
[Mon Feb 13 13:09:00 2017] [22166]   112 22166    16869     1066      25       3        0             0 pickup
[Mon Feb 13 13:09:00 2017] [22256]    33 22256   100903     4062     155       3     1028             0 apache2
[Mon Feb 13 13:09:00 2017] [22272]    33 22272   101113     6090     157       3      919             0 apache2
[Mon Feb 13 13:09:00 2017] [22273]    33 22273   101016     4352     156       3     1034             0 apache2
[Mon Feb 13 13:09:00 2017] [22285]    33 22285   100915     4301     155       3     1030             0 apache2
[Mon Feb 13 13:09:00 2017] [22290]    33 22290   100994     3732     153       3     1036             0 apache2
[Mon Feb 13 13:09:00 2017] [22291]    33 22291   101505     5021     157       3     1031             0 apache2
[Mon Feb 13 13:09:00 2017] [22292]    33 22292   100991     3457     153       3     1048             0 apache2
[Mon Feb 13 13:09:00 2017] [22313]    33 22313   100886     3437     153       3     1049             0 apache2
[Mon Feb 13 13:09:00 2017] [22314]    33 22314   100374     1309     144       3     1139             0 apache2
[Mon Feb 13 13:09:00 2017] [22315]    33 22315   100890     3193     151       3     1054             0 apache2
[Mon Feb 13 13:09:00 2017] [22316]    33 22316   101015     5788     160       3      986             0 apache2
[Mon Feb 13 13:09:00 2017] [22317]    33 22317   100380     1658     144       3     1127             0 apache2
[Mon Feb 13 13:09:00 2017] [22318]    33 22318   100901     3771     155       3     1046             0 apache2
[Mon Feb 13 13:09:00 2017] [22319]    33 22319   100374     1309     144       3     1139             0 apache2
[Mon Feb 13 13:09:00 2017] [22321]    33 22321   100900     3686     155       3     1045             0 apache2
[Mon Feb 13 13:09:00 2017] [22326]    33 22326   101024     4417     155       3     1039             0 apache2
[Mon Feb 13 13:09:00 2017] [22327]    33 22327   100378     1309     144       3     1139             0 apache2
[Mon Feb 13 13:09:00 2017] [22328]    33 22328   101533     6627     161       3      853             0 apache2
[Mon Feb 13 13:09:00 2017] [22329]     0 22329    12235      727      29       3       11             0 cron
[Mon Feb 13 13:09:00 2017] [22330]     0 22330     1127      195       8       3        0             0 sh
[Mon Feb 13 13:09:00 2017] [22331]     0 22331     1127      202       8       3        0             0 sessionclean
[Mon Feb 13 13:09:00 2017] [22332]     0 22332     1127       26       8       3        0             0 sessionclean
[Mon Feb 13 13:09:00 2017] [22334]     0 22334     4144      185       9       3        0             0 sort
[Mon Feb 13 13:09:00 2017] [22335]     0 22335     4144      186      10       3        0             0 sort
[Mon Feb 13 13:09:00 2017] [22336]     0 22336     1127       26       8       3        0             0 sessionclean
[Mon Feb 13 13:09:00 2017] [22342]     0 22342     1127      301       8       3        0             0 sessionclean
[Mon Feb 13 13:09:00 2017] [22353]     0 22353     1127       29       8       3        0             0 sessionclean
[Mon Feb 13 13:09:00 2017] Out of memory: Kill process 22092 (mysqld) score 11 or sacrifice child
[Mon Feb 13 13:09:00 2017] Killed process 22092 (mysqld) total-vm:617704kB, anon-rss:76544kB, file-rss:14356kB

And the content of /proc/meminfo:

MemTotal:        4044964 kB
MemFree:           69100 kB
MemAvailable:    3665524 kB
Buffers:         1509532 kB
Cached:          1959440 kB
SwapCached:         6288 kB
Active:          2181068 kB
Inactive:        1524292 kB
Active(anon):     133556 kB
Inactive(anon):   119404 kB
Active(file):    2047512 kB
Inactive(file):  1404888 kB
Unevictable:        3656 kB
Mlocked:            3656 kB
SwapTotal:       4194300 kB
SwapFree:        4112600 kB
Dirty:               184 kB
Writeback:             0 kB
AnonPages:        235104 kB
Mapped:            59700 kB
Shmem:             14192 kB
Slab:             220372 kB
SReclaimable:     192240 kB
SUnreclaim:        28132 kB
KernelStack:        3936 kB
PageTables:        17116 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     6216780 kB
Committed_AS:    1189376 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:     14336 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      137216 kB
DirectMap2M:     4188160 kB

I'm not too familiar with oom-killer, so I'm wondering if this is due to total_vm, where that column adds up to 2937127 – which, if multiplied by 4 kB pages, gives 11.2 GB.

Where I have 4 GB of physical RAM, and 4GB of SWAP space (total of 8), which if multiplied by 1.5 (due to vm.overcommit_ratio 50), would give 12GB.


For comparison, the old server never had any memory problems, and it was running a few more services as well.

It also has 4GB RAM and 4GB of SWAP, and exactly the same vm.overcommit_ settings… the only real difference is that it's a physical server, and running CentOS (6.8)


Am I missing something? because this server looks like it should have plenty of RAM to spare, and hasn't touched the SWAP space.

Best Answer

Just to provide the answer, in this case it was due to Ubuntu bug #1655842.

Related Topic