MongoDB – DB/collection fragmentation level

mongodbperformance

I recently had a performance issue with one of my collections.
On a whim (and thanks to this article), I decided to try compacting (as written in the official documentation).
This worked brilliantly. However, I'm now wondering how often I should do it.
Since compacting is not a completely online task (I can only do it on passive nodes), I can't decide to do it every night and forget about it.

I couldn't find any documentation about knowing when a DB/collection has a high fragmentation rate. Do you have any experience with determining the fragmentation level of a DB (other than benchmarking)?

Note: I'm talking about "internal" fragmentation as in spaces inside the file, not "external" fragmentation as in file spread accross the disk.

Best Answer

Compaction-level can be determined by comparing the datasize in a collection through db.stats. dataSize gives you how much data is in the collection, where storageSize tells you how big the files are. dataSize <= storageSize, but how big the difference is should tell you how much gain you'll get through compaction.

Mongo doesn't allow objects to not be entirely co-located, so you won't get cases where an object is scattered across the datafiles. Where this comes into play is if an object expands past it's free-allocation, the entire object has to be rewritten somewhere bigger.

When I was playing with Mongo databases, a compaction in a quarterly maintenance window was all we needed. But then, our dataset didn't have a whole lot of deletions, so we weren't creating voids that often. To figure out your rate, track those two dbStats values and see how they move over time.