Linux – Where did the memory go on linux (no cache/slab/shm/ipcs)

linuxmemory

This is a headless server with 8GB RAM (kernel 3.12)… even after only a few days, i get low on memory. in fact, this server has OOMed a few days ago… something is losing memory, but i don't know where…

see the output below:

in short:

  • 64bit system & OS
  • not a hypervisor nor a virtual machine
  • low free mem
  • swap in use
  • low cache
  • low buffer
  • inactive+active == 1GB ???
  • low ipcs
  • low shm
  • low slab
  • ~500MB tmpfs usage
  • in fact total RSS of all processes is 262MB
  • and HWM of all processes is less than 600MB
  • i lost more than 6GB somewhere…?
[root@localhost ~]# cat /proc/meminfo 
MemTotal:        8186440 kB
MemFree:          251188 kB
Buffers:             144 kB
Cached:           853548 kB
SwapCached:         9988 kB
Active:           480036 kB
Inactive:         529456 kB
Active(anon):     256196 kB
Inactive(anon):   333072 kB
Active(file):     223840 kB
Inactive(file):   196384 kB
Unevictable:       13656 kB
Mlocked:               0 kB
SwapTotal:       4194300 kB
SwapFree:        4092540 kB
Dirty:               356 kB
Writeback:             0 kB
AnonPages:        161576 kB
Mapped:            50116 kB
Shmem:            419812 kB
Slab:              72680 kB
SReclaimable:      50648 kB
SUnreclaim:        22032 kB
KernelStack:        1824 kB
PageTables:        10260 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     8287520 kB
Committed_AS:    1883404 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       91804 kB
VmallocChunk:   34359637332 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       83180 kB
DirectMap2M:     8296448 kB

[root@localhost ~]# ipcs -m 

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x01123bac 0          root       600        1000       8                       

[root@localhost ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           4.0G  393M  3.6G  10% /run

[root@localhost ~]# for i in /proc/*/status ; do grep VmRSS $i; done | awk '{ s = s + $2 } END { print s / 1024 }'
262.375

[root@localhost ~]# for i in /proc/*/status ; do grep VmHWM $i; done | awk '{ s = s + $2 } END { print s / 1024 }'
526.77

Edit: i've set overcommit=2 (disabled) just in case (i rebooted 2 days ago)

[root@localhost linux]# cat /proc/sys/vm/overcommit_memory 
2
[root@localhost linux]# df -h | grep tmpfs
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           4.0G     0  4.0G   0% /dev/shm
tmpfs           4.0G  532K  4.0G   1% /run
tmpfs           4.0G     0  4.0G   0% /sys/fs/cgroup
tmpfs           4.0G     0  4.0G   0% /tmp
tmpfs           4.0G  532K  4.0G   1% /var/spool/postfix/run/saslauthd
[root@localhost linux]# for i in /proc/*/status ; do grep VmRSS $i; done | awk '{ s = s + $2 } END { print s / 1024 }'
434.188
[root@localhost linux]# for i in /proc/*/status ; do grep VmHWM $i; done | awk '{ s = s + $2 } END { print s / 1024 }'
545.551
[root@localhost linux]# cat /proc/meminfo 
MemTotal:        8186440 kB
MemFree:          146576 kB
Buffers:            1728 kB
Cached:          5212588 kB
SwapCached:            0 kB
Active:          2560112 kB
Inactive:        2874464 kB
Active(anon):      94464 kB
Inactive(anon):   136528 kB
Active(file):    2465648 kB
Inactive(file):  2737936 kB
Unevictable:        9772 kB
Mlocked:               0 kB
SwapTotal:       4194300 kB
SwapFree:        4194300 kB
Dirty:              1436 kB
Writeback:             0 kB
AnonPages:        230032 kB
Mapped:            50540 kB
Shmem:               960 kB
Slab:             316804 kB
SReclaimable:     291712 kB
SUnreclaim:        25092 kB
KernelStack:        1880 kB
PageTables:        11184 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     8287520 kB
Committed_AS:    1160812 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       91676 kB
VmallocChunk:   34359582672 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       91372 kB
DirectMap2M:     8288256 kB

so, i'm using 8GB:

  • 5GB is cached
  • 0.5MB tmpfs
  • 450MB RSS
  • ~1GB slab+pages+whatever (in meminfo)

i'm still short 1.5GB … is this a kernel leak? or what is going on here???

Edit2: i have the same issue on another atom board

I also checked if kmemleak saw something, but nothing… i'm out of ideas…

Edit3: updating to kernel 3.17.2 seems to have resolved this issue, but i still don't know how to trace these memory leaks…

Best Answer

lkml thinks that it might have been https://lkml.org/lkml/2014/10/15/447 , but that patch wasn't in 3.17.2 and the thp allocation don't point that way

however, /proc kpageflags might show what part allocated what pages, so that might help. in tools/vm/page-types.c in kernel sources, that might hold some info on the structure of the kpageflags binary output.