Linux memory fragmentation

fragmentationlinuxlinux-kernel

Is there a way to detect memory fragmentation on Linux? This is because on some long running servers I have noticed performance degradation and only after I restart process I see better performance. I noticed it more when using Linux huge page support — are huge pages in Linux more prone to fragmentation?

I have looked at /proc/buddyinfo in particular. I want to know whether there are any better ways(not just CLI commands per se, any program or theoretical background would do) to look at it.

Best Answer

I am answering to the tag. My answer is specific only to Linux.

Yes, huge pages are more prone to fragmentation. There are two views of memory, the one your process gets (virtual) and the one the kernel manages (real). The larger any page, the more difficult it's going to be to group (and keep it with) its neighbors, especially when your service is running on a system that also has to support others that by default allocate and write to way more memory than they actually end up using.

The kernel's mapping of (real) granted addresses is private. There's a very good reason why userspace sees them as the kernel presents them, because the kernel needs to be able to overcommit without confusing userspace. Your process gets a nice, contiguous "Disneyfied" address space in which to work, oblivious of what the kernel is actually doing with that memory behind the scenes.

The reason you see degraded performance on long running servers is most likely because allocated blocks that have not been explicitly locked (e.g. mlock()/mlockall() or posix_madvise()) and not modified in a while have been paged out, which means your service skids to disk when it has to read them. Modifying this behavior makes your process a bad neighbor, which is why many people put their RDBMS on a completely different server than web/php/python/ruby/whatever. The only way to fix that, sanely, is to reduce the competition for contiguous blocks.

Fragmentation is only really noticeable (in most cases) when page A is in memory and page B has moved to swap. Naturally, re-starting your service would seem to 'cure' this, but only because the kernel has not yet had an opportunity to page out the process' (now) newly allocated blocks within the confines of its overcommit ratio.

In fact, re-starting (lets say) 'apache' under a high load is likely going to send blocks owned by other services straight to disk. So yes, 'apache' would improve for a short time, but 'mysql' might suffer .. at least until the kernel makes them suffer equally when there is simply a lack of ample physical memory.

Add more memory, or split up demanding malloc() consumers :) Its not just fragmentation that you need to be looking at.

Try vmstat to get an overview of what's actually being stored where.