MongoDB is not using all of the system memory for working set


My machine has 42GB RAM, and the total index size is 40GB (working set should be 100% filled as the indexes are all on UUID columns).
I suspect my working set is too large as the page fault started to increase tremendous, e.g. from avg 20 jumped to 120.

I found out that seems mongod is not using all of my memory, e.g.

ps ax | grep mongod
23051 mongod    20   0  338g 7.7g 7.5g S 87.8 16.4   1533:17 /usr/bin/mongod -f /etc/mongod.conf                                                                                                                                           

Only 7.7G is currently used.

And with the 2.4's workingSet estimator, I found out the working set over 50s is only around 650MB, seems it is unexpected from my data size.

"workingSet" : {
    "note" : "thisIsAnEstimate",
    "pagesInMemory" : 166069,
    "computationTimeMicros" : 49281,
    "overSeconds" : 50

Do you have any idea?

Best Answer

First, I would suggest taking a look here:

That issue discusses much the same topic. To summarize a little: because of how journaling works in MongoDB and how it remaps memory, it can cause resident memory for the mongod process to seem artificially low. If you take a look at the output of the free command and your filesystem cache is relatively full, then you are even more likely to be hitting this resident memory reporting anomaly (assuming mongod is the only really heavy consumer of memory on the system, of course).

However, MongoDB (on Linux at least) only reports hard (actual) page faults, not the soft page faults that occur when a process request pages that are already in memory but just not "owned" by the requesting process. Hence you are right to be concerned about the increase in page faults, it is one of the best measurements to suggest that your data is not currently fitting into memory and hence you are having to hit disk.

In terms of confirming that everything you want is in memory, you can use the touch command to load the index and data you desire into the filesystem cache (not resident). It should be noted that this is a somewhat blunt tool in that it will simply load the entire set of data and/or the entire index into the cache, and it may cause load/locking on the system - use with caution. It can be more efficient, depending on your data set, to load in recent or known hot data with a find query and an explain. Something like this:

db.collection.find({criteria for loading data}).explain()

Or, to make sure a specific index is loaded, add an explicit hint:

db.collection.find({criteria for loading data}).hint({index name})explain()

Something else to look at is the efficiency with which you are loading data into memory when you do hit disk. In general this is a trade off between IO and memory utilization, but if your number one priority is memory efficiency and you have some spare IO to throw at the problem, then on MongoDB you will generally want to tweak your readahead settings down using the blockdev command. For more information, see the other Serverfault questions/answers here and here.