CentOS, kernel: out of memory

centos6kernelmemorymemory usage

Hope you can help me with the following problem.

We are running a CrushFTP service on a CentOS release 6.6 (Final) system. But nearly every week the service crashes.

So I take a look at the logs and found this lines

cat /var/log/messages

Jun 28 05:06:23 crushftp kernel: Out of memory: Kill process 1491 (java) score 883 or sacrifice child
Jun 28 05:06:23 crushftp kernel: Killed process 1491, UID 0, (java) total-vm:9620220kB, anon-rss:3245824kB, file-rss:128kB

CrushFTP is java and the only service we are running on the machine. The log looks like the system is killing the process.

But I don't understand why. So I searched a bit found this setting

cat /proc/sys/vm/overcommit_memory
0

When I understand it correct, the value must be ok and if the process needs more RAM it should be able to get it.

When I do a "top" the java process is the process with the highest usage of RAM.

top - 11:13:58 up 1 day, 4 min,  1 user,  load average: 0.93, 0.94, 0.91
Tasks:  97 total,   1 running,  96 sleeping,   0 stopped,   0 zombie
Cpu(s): 11.2%us, 19.7%sy,  0.0%ni, 68.6%id,  0.0%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:   3924136k total,  2736996k used,  1187140k free,   149380k buffers
Swap:  4128764k total,        0k used,  4128764k free,   814480k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
1486 root      20   0 3633m 1.5g  13m S 20.3 39.8 191:24.36 java

The RAM is about 4GB and the SWAP file is the same size.

[root@atcrushftp ~]# cat /proc/meminfo
MemTotal:        3924136 kB
MemFree:         1159964 kB
Buffers:          149400 kB
Cached:           814476 kB
SwapCached:            0 kB
Active:          1956028 kB
Inactive:         619664 kB
Active(anon):    1611452 kB
Inactive(anon):      528 kB
Active(file):     344576 kB
Inactive(file):   619136 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       4128764 kB
SwapFree:        4128764 kB
Dirty:                36 kB
Writeback:             4 kB
AnonPages:       1597696 kB
Mapped:            34108 kB
Shmem:               164 kB
Slab:             136024 kB
SReclaimable:      74432 kB
SUnreclaim:        61592 kB
KernelStack:        1384 kB
PageTables:         5948 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     6090832 kB
Committed_AS:     746432 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      285216 kB
VmallocChunk:   34359441520 kB
HardwareCorrupted:     0 kB
AnonHugePages:   1501184 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       18432 kB
DirectMap2M:     4175872 kB

I asked the support but they are saying, that it's not a fault by CrushFTP and the system is getting out of memory.

Now my question is how can I found out which process is taking all the last free memory?

Best Answer

It's been a while since I had to read an OOM-killer log, but as I recall, this

Jun 28 05:06:23 crushftp kernel: Killed process 1491, UID 0, (java) total-vm:9620220kB, anon-rss:3245824kB, file-rss:128kB

means that java was using 9GB of VM when the OOM-killer shot it in the head. Given that you have 4GB of core, and 4GB of swap, that seems like a reasonable thing to do. You then write

When I understand it correct, the value must be ok and if the process needs more RAM it should be able to get it.

which I don't understand.

Firstly, setting that value to 0 doesn't turn off overcommitment. As Red Hat write, setting this to 0 requires that the

kernel performs heuristic memory overcommit handling by estimating the amount of memory available and failing requests that are blatantly invalid. Unfortunately, since memory is allocated using a heuristic rather than a precise algorithm, this setting can sometimes allow available memory on the system to be overloaded.

Setting it to 2 does what you seem to want:

The kernel denies requests for memory equal to or larger than the sum of total available swap and the percentage of physical RAM specified in overcommit_ratio. This setting is best if you want a lesser risk of memory overcommitment.

But even turning off overcommit doesn't guarantee that a process can always get more RAM: only infinite VM guarantees that. As long as core+swap is finite, it can be used up - and if you have a process that's comsumed all the free VM at the moment the kernel needs a bit more, then the OOM-killer will wake up, and, well, that looks like what happened.

My recommendations are:

  1. Don't run java as root. Ideally, don't run it at all, but if you must, not as root; that gives it a weighting in the OOM-killers eyes which may result in something important getting killed instead.

  2. Find the memory leak in whatever's using java.

  3. If you really believe you don't have a memory leak, then you don't have enough core; pony up for a bigger server. Give it more swap, as well.

  4. Monitor your java's VM footprint better; shoot it in the head if it gets all swollen.