Inserting JSON document with `.` in key to MongoDB

jsonmongodb

Firstly, this is more of a design question than a programming question.

I am creating an application where I have to fetch existing JSON data and insert it into MongoDB. I found that some of the the JSON documents have a period . in their key. I read in the MongoDB documentation that periods . are not allowed as keys in MongoDB as they are used for querying.

I don't do a lot of insertions in the web applications, it is pretty much a one time insertion. Also, I would mostly retrieve the entire document rather than querying for parts of it as I need to get all the data.

So, considering my requirements, I have two choices on how to store the JSON document:

  1. Search through the JSON for period in the keys and escape them and then insert them into MongoDB.
  2. Convert the entire JSON into BSON format and store them as such, thus avoiding the need for escaping, and manually parse the JSON when needed outside MongoDB

Could you tell me which would be a better design, as I am not able to come to a conclusion.

Best Answer

There are a few alternatives:

1. Replace dots by a dash.

This would be my favorite approach, as it keeps the structure explicit enough.

Since according to you, “it is pretty much a one time insertion,” it should be relatively simple to check if it doesn't break anything (i.e. there is already a same key with a dash). For other situations, doing those checks programmatically requires to write some code, but is still a relatively easy task.

2. Replace dots by a Unicode dot character such as U+FF0E.

I would strongly advise against this approach, since it would lead to massive debugging headaches down the road. Letting someone who uses the resulting JSON somewhere in the code far away from MongoDB to guess that a dot is not really a dot is a good way to waste literally weeks of someone's time. Keep such Unicode tricks to the hackers who want to trick someone into thinking that a character is a different one.

3. Use BSON.

Since you claim that you “would mostly retrieve the entire document rather than querying for parts of it,” this approach doesn't have major drawbacks in your case. Although, you said “mostly,” which means that sometimes, you will retrieve only parts of the document.

In general, the drawback is that you won't be able to search through the document or to load only a part of it.

4. Use a standard encoding, such as Base64.

Converting the problematic keys (or all keys, depending on the ratio between problematic and non-problematic ones) to Base64 or hexadecimal could be a viable solution, with the benefit of being rather explicit: most developers would recognize Base64 or hexadecimal values at a glance.

The drawback is the increased memory footprint, as well as the necessity to encode and decode the keys when using them.

5. Set check_keys to false.

I would strongly advise against this approach, since is would make the data query ambiguous, and waste hours or days trying to figure out why a specific query doesn't do what you imagined it should be doing. Dot is a reserved character and the check is here to protect you; by telling MongoDB to skip the check, you will only postpone the moment where you'll have to deal with a conflict between the syntax of MongoDB and the reserved character used in a key.