Linux – how to find out what is causing huge dentry_cache usage

linuxmemory

Note that inode_cache & ext3_inode_cache slabs are very small compared to dentry_cache.
What happens is that slowly and steadily the within a week dentry_cache grows from 1M to ~5-6G
Then I need to run
echo 2 > /proc/sys/vm/drop_caches && echo 0 > /proc/sys/vm/drop_caches
This started to happening one day on all servers hosting some web code – the developers are saying that they have not changed anything related to filesystem access pattern around the time then the problem started.

The system is centos5 with 2.6.18 kernel so I don't have any instrumentation features available th newer kernels.
Any I idea how I can debug the problem? maybe with systemtap? This is a ec2 instance – so not even sure that systemtap will work there.

Thanks
Alex

Best Answer

Late, but maybe useful for others who come upon this.

If you are using the AWS SDK on that EC2 instance, it is highly likely that curl is causing the dentry bloat. While I haven't seen this trigger OOM, it is known to impact the performance of the server, due to the additional work required by the OS to reclaim SLAB.

If you can confirm that curl is being used by your developers to hit https (many of the AWS SDK do this), then the solution is to upgrade the nss-softokn library to at least v3.16.0 and set the environment variable, NSS_SDB_USE_CACHE (YES and NO are valid values, you may have to benchmark to see which performs curl requests more efficiently) for the process which is using libcurl.

I recently ran into this myself and wrote a blog entry (old blog entry link and upstream bug report) with some diagnostics & more detailed information, in case that helps.