I would store each guess as a separate document in a user guesses collection. The structure of each document would be as follows:
Guess
- userId
- selectedImageId
- correctImageId
You may also store more information about the corresponding quick, such as the shown images and played audio. To generate statistics, you will need to run Map/Reduce over this collection. For example, to get stats of total and correct guesses, your Map/Reduce output document structure would be:
UserStats
- userId
- guesses
- correct
The map function may look like this:
function() {
emit(this.userId, { guesses: 1, correct: this.selectedImageId === this.correctImageId ? 1 : 0 });
}
And the reduce function:
function(key, values) {
var result = { userId = key, guesses: 0, correct: 0 };
values.forEach(function(value) {
result.guesses += value.guesses;
result.correct += value.correct;
});
return result;
}
Note, on MongoDB, Map/Reduce is not run automatically as source collections are updated, so you will need to implement that on your own.
Store hashtags in an array within a document.
That's the benefit of having documents: you can simply nest them. And, in this particular case, it's trivial:
{
"_id": 123,
"file": "c43a5f46-kitten.png",
"description": "My kitten :3 #kittens #cute"
"hashtags": ["kittens", "cute", "cat", "animals"]
}
(I added some "synonymous" tags, this can be done automatically by looking up some other document.)
This is the most natural solution for document-oriented database:
- Searching documents by hashtags is trivial if you just add an index, as well as inserting, updating, and deleting hashtags on random documents is also trivial
- Massive inserting, updating, and deleting is a bit tricky, because you'd probably want to split such operations in multiple "batches", but still it's manageable and not hard to implement
- Complex aggregations can be done with the standard aggregation pipeline or map-reduce
On the other hand, if you go with relational style, you'll be in a big trouble when you reinvent a SQL JOIN
within your application code. This is one of the most common anti-patterns of using MongoDB (and such). Here's a very typical pseudocode:
for (HashTag tag: mongodb.hashtags.find()) {
for (Image img: mongodb.images.find(
new Document("_id", new tag.getImageId()))) {
// ...
}
}
This is inefficient, not scalable, and you are simply reinventing a wheel. Using this, you'll probably end up with complexity of O(N*M)
because of loops within your code. If you'd choose SQL with foreign keys instead, you'd have something like O(N*log(M))
or even O(N+M)
.
There are no tables (relations) and foreign keys in MongoDB. Do not invent them, please. Use SQL instead, if you need. In fact, I highly suggest using SQL instead of MongoDB, unless your data really consists of documents.
Typical examples of documents are configurations, forms, and maybe user sessions. Those typically don't fit well into tables because of "random" structure.
Best Answer
There is no "right" way to use MongoDB, only trade-offs. Let's take a simple example:
Option 1: Have everything in one document, and
$push
packets into an arrayOption 2: Make a new document for each packet. Have each one point to their 'parent' object.
The trade-offs:
When searching for a packet, Option 1 will return the entire document, and not tell you where in the array your packet is. Option 2 will return the specific packet.
Option 1 will fail over and die if you have too many packets (because of the 16MB limit on document size)
Option 1 is much faster if you always list all packets for a specific object. Option 2 requires many disk seeks to load all the packets. On modern hard drives, disk seeks are 100x slower than sequential disk reads.
Option 2 takes a little more disk space (because they all must link to their parent, instead of implicitly as in Option 1).
Option 2 has roughly constant write time, but Option 1 will have variable write time. In option 1, you are expanding an existing document when adding packets. Sometimes it won't fit and must be moved somewhere else. This can slow down the system. But if you're constantly adding packets to one document at a time, it will probably eventually be moved to the end where it can expand without being moved around, so that's slightly better.
The rule of thumb is: Always assume you will store everything in one document until you find a reason to break it up. But beware of the trade-offs each way.