Linux not freeing large disk cache when memory demand goes up

disk-cachelinuxmemorymemory usage

Running Ubuntu on a 2.6.31-302 x86-64 kernel. The overall problem is that I have memory in the 'cached' category that keeps on going up and will not be freed or used even when our application needs it.

So here's what I get out of the 'free' command. None of this looks out of the ordinary at first glance.

# free
             total       used       free     shared    buffers     cached
Mem:       7358492    5750320    1608172          0       7848    1443820
-/+ buffers/cache:    4298652    3059840
Swap:            0          0          0

The first thing someone's going to say is "Don't worry, linux manages that memory automatically." Yes, I know how the memory manager is supposed to work; the problem is that it's not doing the right thing. The "cached" 1.4 GB here appears to be reserved and unusable.

My knowledge of Linux tells me that 3 GB is "free"; but the behavior of the system says otherwise. When the 1.6 GB of real free memory is used up during peak usage, as soon as more memory is demanded (and the 'free' in the first column approaches 0) the OOM killer is invoked, processes are killed, and problems start to arise even though the 'free' in the -/+ buffers/cache row still has about 1.4 GB 'free'.

I've tuned the oom_adj values on key processes so it doesn't bring the system to its knees, but even then important processes will be killed, and we never want to reach that point. Especially when, theoretically, 1.4GB is still "free" if it would only evict the disk cache.

Does anyone have any idea what's going on here? The internet is flooded with the dumb questions about the Linux 'free' command and "why don't I have any free memory" and I can't find anything about this issue because of that.

The first thing that pops into my head is that swap is off. We have a sysadmin that is adamant about it; I am open to explanations if they're backed up. Could this cause problems?

Here's free after running echo 3 > /proc/sys/vm/drop_caches :

# free
             total       used       free     shared    buffers     cached
Mem:       7358492    5731688    1626804          0        524    1406000
-/+ buffers/cache:    4325164    3033328
Swap:            0          0          0

As you can see, some minuscule amount of cache is actually freed up, but around 1.4 GB appears to be "stuck." The other problem is that this value seems to rise over time. On another server 2.0 GB is stuck.

I'd really like this memory back… any help would be most appreciated.

Here's cat /proc/meminfo if it's worth anything:

# cat /proc/meminfo 
MemTotal:        7358492 kB
MemFree:         1472180 kB
Buffers:            5328 kB
Cached:          1435456 kB
SwapCached:            0 kB
Active:          5524644 kB
Inactive:          41380 kB
Active(anon):    5492108 kB
Inactive(anon):        0 kB
Active(file):      32536 kB
Inactive(file):    41380 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               320 kB
Writeback:             0 kB
AnonPages:       4125252 kB
Mapped:            42536 kB
Slab:              29432 kB
SReclaimable:      13872 kB
SUnreclaim:        15560 kB
PageTables:            0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     3679244 kB
Committed_AS:    7223012 kB
VmallocTotal:   34359738367 kB
VmallocUsed:        7696 kB
VmallocChunk:   34359729675 kB
DirectMap4k:     7340032 kB
DirectMap2M:           0 kB

Best Answer

I have discovered the answer to my own question - thanks to womble's help (submit an answer if you like).

lsof -s shows file handles in use, and turns out there were several gigabytes of mmap'd log files taking up the cache.

Implementing a logrotate should resolve the issue completely and allow me to take advantage of more memory.

I will also re-enable swap so we have no problems with the OOM killer in the future. Thanks.