Mongodb array vs object

database-designmongodb

New with Mongodb, and I'm not sure what implications or benefits there are between the use of an array vs an object.

So..

As above what are the implications and benefits between the use of arrays and objects?
For example my case. The idea is to store a document with a "capture" time, and a 2000 slots of "packet" counters. I used object to store to packets see this , but apparently it seems better off to use arrays. What kind of considerations should one make to determine whether an array or object is more suitable (with example)?

Best Answer

There is no "right" way to use MongoDB, only trade-offs. Let's take a simple example:

Option 1: Have everything in one document, and $push packets into an array

Option 2: Make a new document for each packet. Have each one point to their 'parent' object.

The trade-offs:

When searching for a packet, Option 1 will return the entire document, and not tell you where in the array your packet is. Option 2 will return the specific packet.
Option 1 will fail over and die if you have too many packets (because of the 16MB limit on document size)
Option 1 is much faster if you always list all packets for a specific object. Option 2 requires many disk seeks to load all the packets. On modern hard drives, disk seeks are 100x slower than sequential disk reads.
Option 2 takes a little more disk space (because they all must link to their parent, instead of implicitly as in Option 1).
Option 2 has roughly constant write time, but Option 1 will have variable write time. In option 1, you are expanding an existing document when adding packets. Sometimes it won't fit and must be moved somewhere else. This can slow down the system. But if you're constantly adding packets to one document at a time, it will probably eventually be moved to the end where it can expand without being moved around, so that's slightly better.

The rule of thumb is: Always assume you will store everything in one document until you find a reason to break it up. But beware of the trade-offs each way.

Related Solutions

MongoDB Schema – Recommended Schema for a Quiz-Engine Scenario

I would store each guess as a separate document in a user guesses collection. The structure of each document would be as follows:

Guess
- userId
- selectedImageId
- correctImageId

You may also store more information about the corresponding quick, such as the shown images and played audio. To generate statistics, you will need to run Map/Reduce over this collection. For example, to get stats of total and correct guesses, your Map/Reduce output document structure would be:

UserStats
- userId
- guesses
- correct

The map function may look like this:

function() {
    emit(this.userId, { guesses: 1, correct: this.selectedImageId === this.correctImageId ? 1 : 0 });
}

And the reduce function:

function(key, values) {
  var result = { userId = key, guesses: 0, correct: 0 };
  values.forEach(function(value) {
    result.guesses += value.guesses;
    result.correct += value.correct;   
  });
  return result;
}

Note, on MongoDB, Map/Reduce is not run automatically as source collections are updated, so you will need to implement that on your own.

How to model hashtags with nodejs and mongodb

Store hashtags in an array within a document.

That's the benefit of having documents: you can simply nest them. And, in this particular case, it's trivial:

{
    "_id": 123,
    "file": "c43a5f46-kitten.png",
    "description": "My kitten :3 #kittens #cute"
    "hashtags": ["kittens", "cute", "cat", "animals"]
}

(I added some "synonymous" tags, this can be done automatically by looking up some other document.)

This is the most natural solution for document-oriented database:

Searching documents by hashtags is trivial if you just add an index, as well as inserting, updating, and deleting hashtags on random documents is also trivial
Massive inserting, updating, and deleting is a bit tricky, because you'd probably want to split such operations in multiple "batches", but still it's manageable and not hard to implement
Complex aggregations can be done with the standard aggregation pipeline or map-reduce

On the other hand, if you go with relational style, you'll be in a big trouble when you reinvent a SQL JOIN within your application code. This is one of the most common anti-patterns of using MongoDB (and such). Here's a very typical pseudocode:

for (HashTag tag: mongodb.hashtags.find()) {
   for (Image img: mongodb.images.find(
           new Document("_id", new tag.getImageId()))) {
       // ...
   }
}

This is inefficient, not scalable, and you are simply reinventing a wheel. Using this, you'll probably end up with complexity of O(N*M) because of loops within your code. If you'd choose SQL with foreign keys instead, you'd have something like O(N*log(M)) or even O(N+M).

There are no tables (relations) and foreign keys in MongoDB. Do not invent them, please. Use SQL instead, if you need. In fact, I highly suggest using SQL instead of MongoDB, unless your data really consists of documents.

Typical examples of documents are configurations, forms, and maybe user sessions. Those typically don't fit well into tables because of "random" structure.

Best Answer

Related Solutions

MongoDB Schema – Recommended Schema for a Quiz-Engine Scenario

How to model hashtags with nodejs and mongodb

Related Topic