Does the MongoDB 3.2 WiredTiger compression include stuff stored in RAM

mongodb

As I understand it WiredTiger compresses the journal, collections and indexes. Does it also compress them whilst they are stored in RAM?

For example if my compresses index use 10 MiB on disk can I assume that they also use 10 MiB of RAM? Or should I expect a larger uncompressed index in RAM?

Best Answer

WiredTiger has different representations of data on disk versus in memory, and uses different compression approaches for indexes vs collection data.

The answer on what is compressed in memory is somewhat nuanced, but the high level summary is:

  • collection data is compressed in the filesystem cache
  • collection data is uncompressed in the WiredTiger internal cache
  • indexes are compressed on disk and in memory

Compression Approaches

By default WiredTiger uses Snappy block compression for collection data, but there are other options available including zlib compression or no compression. Block compression can provide significant storage savings, but data must be uncompressed to be manipulated by the server. Irrespective of compression options, data is still written to the disk in a block format that differs from the in-memory representation in the WiredTiger cache.

Indexes are compressed using index prefix compression, which effectively deduplicates common prefixes from indexed fields. This can be especially effective for compound indexes, since leading field values will be repeated with unique values for additional fields in the index. Prefix compression also allows queries to operate directly on compressed indexes.

What's compressed in RAM?

As at MongoDB 3.4 (and including prior MongoDB versions with WiredTiger), there are two significant caches for data in RAM:

  • The WiredTiger internal cache, which is controlled by the cacheSizeGB configuration setting.

    The default cache size in MongoDB 3.4 is the larger of 50% of RAM less 1GB, or 256MB. Collection data in the internal cache is uncompressed, however index data still uses prefix compression. The data in the internal WiredTiger cache is effectively the current working set.

  • The O/S filesystem cache, which is generally the remainder of free RAM that is not used by the WiredTiger cache or other processes.

    The filesystem cache is identical to on-disk representation.

Cache Tuning

Note that the WiredTiger cache doesn't represent MongoDB's total memory usage: mongod will still need to allocate memory outside the cache for other uses such as connections and data processing (eg. aggregation, map/reduce, in-memory sort).

The WiredTiger internal cache should generally be left at the default size or potentially reduced. If your data compresses well and the uncompressed data is much larger than RAM, you will be able to fit more data in RAM overall by reducing the WiredTiger cache size to free memory for the filesystem cache. The MongoDB manual has an FAQ with more information: To what size should I set the WiredTiger internal cache?.

For more background, I'd recommend reviewing the New Compression Options in MongoDB 3.0 blog post and A Technical Introduction to WiredTiger presentation.

Related Topic