Why does MongoDB check for order of keys when matching embedded documents

mongodb

The db.collection.find() documentation includes this explanation of Query Exact Matches on Embedded Documents:

The following operation returns documents in the bios collection where the embedded document name is exactly { first: "Yukihiro", last: "Matsumoto" }, including the order

This seems so strange to me, since documents seem like unordered data types (they remind me of JSON, Python dictionaries, etc.).

When this issue reared its ugly head in my code it took me a while to figure out what was wrong, since it's so unintuitive that the order of a JavaScript object's keys would come into play.

Why is this the behavior in MongoDB?

Best Answer

Why is this the behavior in MongoDB?

MongoDB documents are stored on the server in a binary format called BSON (short for "Binary JSON"), which is a JSON-like format that supports additional data types. The JSON format was designed to be human readable and derived supported data types and behaviour from JavaScript. BSON was designed as a binary data interchange format with more precise control over data representation.

While BSON is JSON-like, there are notable differences:

  • A BSON data structure is an ordered object, not a dictionary. For example, BSON does not require field names to be unique (although drivers commonly default to a hash / dictionary / JSON-like interface that does not support duplicate field names). This gives developers precision which can be important for some use cases and also avoids potentially unnecessary server overhead of inspecting and recursively serializing/deserializing BSON into a predefined field order.
  • Where order is important, officially supported drivers rely on the underlying programming language to support an order-preserving data structure or provide their own. For example, the Python driver (aka PyMongo) includes a SON class for manipulating ordered objects similar to a normal Python dictionary.
  • BSON supports additional data types such as binary data, 32-bit and 64-bit integers, floats, and Decimals (MongoDB 3.4+). As a simple contrast for numeric representation, JSON (and JavaScript) currently only support a single numeric type (Number) which represents all values as a double precision floating point number.
  • MongoDB has defined comparison and sort order rules for BSON values. For example, MongoDB uses simple binary comparison for strings by default, and MongoDB 3.4+ adds the option of language-specific collation.

 it's so unintuitive that the order of a Javascript object's keys would come into play

The query example you provided is for an exact match on an embedded document, which includes the field order because the underlying data is ordered in BSON. This query performs a binary comparison of the BSON serialization of the embedded document provided in your query against the BSON field value with the embedded document stored in MongoDB. You can potentially take advantage of the field order to consistently put a more selective field earlier in your embedded document, which can help with the query performance if you've indexed the entire embedded document.

If field order is unimportant you should query matching the embedded fields instead, eg:

db.bios.find(
    {
       "name.first": "Yukihiro",
       "name.last": "Matsumoto"
    }
)

In this case you would create a compound index to support your common queries. The selectivity, order, and sort direction of keys in the index definition would be significant if you want efficient indexes to support your queries.