I have two slightly different AWS EC2 instances of the same type with a huge amount of memory (c4.8xlarge
with 60GB of RAM). One of those instances is just a copy which has been launched from a backup image (AMI) and the issue cannot be reproduced on it.
I stopped all of the services except system ones so most of the memory is free:
> free -m
total used free shared buff/cache available
Mem: 60382 201 59545 9 635 59695
Swap: 0 0 0
I cannot allocate even half of the available memory using stress
utility:
> sudo stress --vm 1 --vm-keep --vm-bytes 30G
stress: info: [40005] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: FAIL: [40006] (494) hogvm malloc failed: Cannot allocate memory
...
And here is an output of memtester
:
> sudo memtester 60000
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 60000MB (62914560000 bytes)
got 29811MB (31259688960 bytes), trying mlock ...locked.
Loop 1:
Stuck Address : ok
...
There are no any ulimit
memory restrictions enabled. I have the same issue on the copies of that server. But everything is fine on the server restored from the older image:
> stress --vm 1 --vm-keep --vm-bytes 58G
stress: info: [14516] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
> sudo memtester 59000
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 59000MB (61865984000 bytes)
got 59000MB (61865984000 bytes), trying mlock ...locked.
...
What can I do to figure out the issue?
Best Answer
It looks like somebody set the
vm.overcommit_memory
value to 2 in the new image.https://www.kernel.org/doc/Documentation/vm/overcommit-accounting:
To fix the issue - enable vm.overcommit_memory (by setting it to 0), or adjust vm.overcommit_ratio, or make a 30Gb swap.
Don't really know how to figure out such wierd problems, but I'd probably do the following things:
vm.*
sysctl parameters on both servers.