Linux – using the RAM on this Ubuntu 12.04.2 server

kernellinuxmemorymemory leakUbuntu

I have two servers in a pool with Nginx, PHP5-FPM and Memcached. For some reason, the first server in the pool seems to inexplicably lose about 2GB of RAM. I can't explain where it's going.

A reboot gets everything back to normal, but after a few hours the RAM is used again.

At first I thought it was down to memcached, but eventually I'd killed every process I could reasonably kill and the memory was not released. Even init 1 did not free the memory.

ipcs -m is empty and slabtop looks much the same on this and the server in the pool which is using very little memory.

df shows about 360K in tmpfs

In case it's relevant, the two servers are nearly identical in that they are both running the same OS at the same level of updates on the same hypervisor (VMWare ESXi 4.1) on different hosts but with identical hardware. The differences are that:-

  • The first server has an NFS mount. I tried unmounting this and removing the modules but no change to RAM usage
  • The first server listens for HTTP and HTTPS sites while the second only listens for HTTP.

Here's the output of free -m …

             total       used       free     shared    buffers     cached
Mem:          3953       3458        494          0        236        475
-/+ buffers/cache:       2746       1206
Swap:         1023          0       1023

Here's /proc/meminfo …

MemTotal:        4048392 kB
MemFree:          506576 kB
Buffers:          242252 kB
Cached:           486796 kB
SwapCached:            8 kB
Active:           375240 kB
Inactive:         369312 kB
Active(anon):      12320 kB
Inactive(anon):     3596 kB
Active(file):     362920 kB
Inactive(file):   365716 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       1048572 kB
SwapFree:        1048544 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:         15544 kB
Mapped:             3084 kB
Shmem:               412 kB
Slab:              94516 kB
SReclaimable:      75104 kB
SUnreclaim:        19412 kB
KernelStack:         632 kB
PageTables:         1012 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     3072768 kB
Committed_AS:      20060 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      281340 kB
VmallocChunk:   34359454584 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       59392 kB
DirectMap2M:     4134912 kB

Here's the process list at the time …

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  24336  2160 ?        Ss   Jul22   0:09 /sbin/init
root         2  0.0  0.0      0     0 ?        S    Jul22   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    Jul22   0:38 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S    Jul22   0:00 [kworker/u:0]
root         6  0.0  0.0      0     0 ?        S    Jul22   0:04 [migration/0]
root         7  0.0  0.0      0     0 ?        S    Jul22   0:32 [watchdog/0]
root         8  0.0  0.0      0     0 ?        S    Jul22   0:04 [migration/1]
root        10  0.0  0.0      0     0 ?        S    Jul22   0:22 [ksoftirqd/1]
root        11  0.0  0.0      0     0 ?        S    Jul22   0:15 [kworker/0:1]
root        12  0.0  0.0      0     0 ?        S    Jul22   0:31 [watchdog/1]
root        13  0.0  0.0      0     0 ?        S    Jul22   0:04 [migration/2]
root        15  0.0  0.0      0     0 ?        S    Jul22   0:04 [ksoftirqd/2]
root        16  0.0  0.0      0     0 ?        S    Jul22   0:14 [watchdog/2]
root        17  0.0  0.0      0     0 ?        S    Jul22   0:04 [migration/3]
root        19  0.0  0.0      0     0 ?        S    Jul22   0:04 [ksoftirqd/3]
root        20  0.0  0.0      0     0 ?        S    Jul22   0:11 [watchdog/3]
root        21  0.0  0.0      0     0 ?        S<   Jul22   0:00 [cpuset]
root        22  0.0  0.0      0     0 ?        S<   Jul22   0:00 [khelper]
root        23  0.0  0.0      0     0 ?        S    Jul22   0:00 [kdevtmpfs]
root        24  0.0  0.0      0     0 ?        S<   Jul22   0:00 [netns]
root        25  0.0  0.0      0     0 ?        S    Jul22   0:02 [sync_supers]
root        26  0.0  0.0      0     0 ?        S    Jul22   0:21 [kworker/u:1]
root        27  0.0  0.0      0     0 ?        S    Jul22   0:00 [bdi-default]
root        28  0.0  0.0      0     0 ?        S<   Jul22   0:00 [kintegrityd]
root        29  0.0  0.0      0     0 ?        S<   Jul22   0:00 [kblockd]
root        30  0.0  0.0      0     0 ?        S<   Jul22   0:00 [ata_sff]
root        31  0.0  0.0      0     0 ?        S    Jul22   0:00 [khubd]
root        32  0.0  0.0      0     0 ?        S<   Jul22   0:00 [md]
root        34  0.0  0.0      0     0 ?        S    Jul22   0:04 [khungtaskd]
root        35  0.0  0.0      0     0 ?        S    Jul22   0:15 [kswapd0]
root        36  0.0  0.0      0     0 ?        SN   Jul22   0:00 [ksmd]
root        37  0.0  0.0      0     0 ?        SN   Jul22   0:00 [khugepaged]
root        38  0.0  0.0      0     0 ?        S    Jul22   0:00 [fsnotify_mark]
root        39  0.0  0.0      0     0 ?        S    Jul22   0:00 [ecryptfs-kthrea]
root        40  0.0  0.0      0     0 ?        S<   Jul22   0:00 [crypto]
root        48  0.0  0.0      0     0 ?        S<   Jul22   0:00 [kthrotld]
root        50  0.0  0.0      0     0 ?        S    Jul22   2:59 [kworker/1:1]
root        51  0.0  0.0      0     0 ?        S    Jul22   0:00 [scsi_eh_0]
root        52  0.0  0.0      0     0 ?        S    Jul22   0:00 [scsi_eh_1]
root        57  0.0  0.0      0     0 ?        S    Jul22   0:09 [kworker/3:1]
root        74  0.0  0.0      0     0 ?        S<   Jul22   0:00 [devfreq_wq]
root       114  0.0  0.0      0     0 ?        S    Jul22   0:00 [kworker/3:2]
root       128  0.0  0.0      0     0 ?        S    Jul22   0:00 [kworker/1:2]
root       139  0.0  0.0      0     0 ?        S    Jul22   0:00 [kworker/0:2]
root       249  0.0  0.0      0     0 ?        S<   Jul22   0:00 [mpt_poll_0]
root       250  0.0  0.0      0     0 ?        S<   Jul22   0:00 [mpt/0]
root       259  0.0  0.0      0     0 ?        S    Jul22   0:00 [scsi_eh_2]
root       273  0.0  0.0      0     0 ?        S    Jul22   0:20 [jbd2/sda1-8]
root       274  0.0  0.0      0     0 ?        S<   Jul22   0:00 [ext4-dio-unwrit]
root       377  0.0  0.0      0     0 ?        S    Jul22   0:26 [jbd2/sdb1-8]
root       378  0.0  0.0      0     0 ?        S<   Jul22   0:00 [ext4-dio-unwrit]
root       421  0.0  0.0  17232   584 ?        S    Jul22   0:00 upstart-udev-bridge --daemon
root       438  0.0  0.0  21412  1176 ?        Ss   Jul22   0:00 /sbin/udevd --daemon
root       446  0.0  0.0      0     0 ?        S<   Jul22   0:00 [rpciod]
root       448  0.0  0.0      0     0 ?        S<   Jul22   0:00 [nfsiod]
root       612  0.0  0.0  21408   772 ?        S    Jul22   0:00 /sbin/udevd --daemon
root       613  0.0  0.0  21728   924 ?        S    Jul22   0:00 /sbin/udevd --daemon
root       700  0.0  0.0      0     0 ?        S<   Jul22   0:00 [kpsmoused]
root       849  0.0  0.0  15188   388 ?        S    Jul22   0:00 upstart-socket-bridge --daemon
root       887  0.0  0.0      0     0 ?        S    Jul22   0:00 [lockd]
root       919  0.0  0.0  14504   952 tty4     Ss+  Jul22   0:00 /sbin/getty -8 38400 tty4
root       922  0.0  0.0  14504   952 tty5     Ss+  Jul22   0:00 /sbin/getty -8 38400 tty5
root       924  0.0  0.0  14504   944 tty2     Ss+  Jul22   0:00 /sbin/getty -8 38400 tty2
root       925  0.0  0.0  14504   944 tty3     Ss+  Jul22   0:00 /sbin/getty -8 38400 tty3
root       930  0.0  0.0  14504   952 tty6     Ss+  Jul22   0:00 /sbin/getty -8 38400 tty6
root       940  0.0  0.0      0     0 ?        S    Jul22   0:07 [flush-8:0]
root      1562  0.0  0.0  58792  1740 tty1     Ss   Jul22   0:00 /bin/login --     
root     12969  0.0  0.0      0     0 ?        S    07:18   0:02 [kworker/2:2]
root     30051  0.0  0.0      0     0 ?        S    10:13   0:00 [flush-8:16]
root     30909  0.0  0.0      0     0 ?        S    10:14   0:00 [kworker/2:1]
johncc   30921  0.2  0.2  26792  9360 tty1     S    10:17   0:00 -bash
root     31089  0.0  0.0      0     0 ?        S    10:18   0:00 [kworker/0:0]
root     31099  0.0  0.0  42020  1808 tty1     S    10:19   0:00 sudo -i
root     31100  0.2  0.1  22596  5168 tty1     S    10:19   0:00 -bash
root     31187  0.0  0.0      0     0 ?        S    10:19   0:00 [kworker/2:0]
root     31219  0.0  0.0  16880  1252 tty1     R+   10:22   0:00 ps aux
root     31220  0.0  0.0  53924   536 tty1     R+   10:22   0:00 curl -F sprunge=<- http://sprunge.us

Can anyone suggest what to try next, or how to debug this problem? I'm at a loss!

Best Answer

The machine is a virtual guest running on ESXi hypervisor. What about memory ballooning? First of all, I would recommend you to check ESXi/vCenter memory/balloon statistics of this guest.

It can happen that the hypervisor asked the guest to "inflate" the balloon in order to allocate some additional memory, e.g. for other running guests. But this requires to have loaded a balloon driver which is available as a kernel module vmmemctl.

Finally, the obvious question may be whether the guest has vmware tools installed and running as I can't see any related processes in the process list you provided. By the change, wasn't there any vmware-guestd process before you started killing them?