Article {
"_id" : "A",
"title" : "Hello World",
"user_id" : 12345,
"text" : 'My test article',
"comments" : [
{ 'text' : 'blah', 'user_id' : 654321, 'votes' : [987654]},
{ 'text' : 'foo', 'user_id' : 987654, 'votes' : [12345, 654321] },
...
]
}
The basic premise here is that I've nested the Comments
inside of the Article
. The Votes
only apply to a Comment
, so they've been stored as an array with each Comment
. In this case, I've just stored the user_id. If you want to store more information (time_created, etc.), then you can votes an array of objects:
... 'votes' : [ { user_id : 987654, ts : 78946513 } ] ...
How to perform your queries efficiently:
- get Article A, comments on Article A and # of votes per comments
db.articles.find( { _id : 'A' } )
This gets everything with one query. You may have to do some client-side logic to count votes per comment, but this is pretty trivial.
- get all comments by User B across all articles
db.articles.ensureIndex( { "comments.user_id" : 1 } )
db.articles.find( { "comments.user_id" : 987654 } ) // returns all document fields
The index will allow for efficiently searching the comments within a document.
There's currently no way to extract only the matches from a sub-array. This query will in fact return all of the articles with comments by that user. If this is potentially way too much data, you can do some trimming.
db.articles.find( { "comments.user_id" : 987654 }, { "title" : 1, "comments.user_id" : 1 })
- get all comments User B voted for
db.articles.ensureIndex( { "comments.votes" : 1 } )
db.articles.find( { "comments.votes" : 987654 } )
Again, this will return all of the Articles, not just the comments.
There's a trade-off to be made here. Returning the article may seem like we're bringing back too much data. But what are you planning to display to the user when you make query #3?
Getting a list of "comments I've voted for" is not terribly useful without the comment itself. Of course the comment is not very useful without the article itself (or at least just the title).
Most of the time, query #3 devolves into a join from Votes
to Comments
to Articles
. If that's the case, then why not just bring back the Articles to start with?
As of 4.0, MongoDB will have multi-document ACID transactions. The plan is to enable those in replica set deployments first, followed by the sharded clusters. Transactions in MongoDB will feel just like transactions developers are familiar with from relational databases - they'll be multi-statement, with similar semantics and syntax (like start_transaction
and commit_transaction
). Importantly, the changes to MongoDB that enable transactions do not impact performance for workloads that do not require them.
For more details see here.
Having distributed transactions, doesn't mean that you should model your data like in tabular relational databases. Embrace the power of the document model and follow the good and recommended practices of data modeling.
Best Answer
You'll definitely need to optimize for the queries you're doing.
Here's my best guess based on your description.
You'll probably want to know all Credit Cards for each Customer, so keep an array of those within the Customer Object. You'll also probably want to have a Customer reference for each Payment. This will keep the Payment document relatively small.
The Payment object will automatically have its own ID and index. You'll probably want to add an index on the Customer reference as well.
This will allow you to quickly search for Payments by Customer without storing the whole customer object every time.
If you want to answer questions like "What was the average amount all customers paid last month" you're instead going to want a map / reduce for any sizeable dataset. You're not getting this response "real-time". You'll find that storing a "reference" to Customer is probably good enough for these map-reduces.
So to answer your question directly: Is MongoDB designed to prefer many, many small documents or fewer large documents?
MongoDB is designed to find indexed entries very quickly. MongoDB is very good at finding a few needles in a large haystack. MongoDB is not very good at finding most of the needles in the haystack. So build your data around your most common use cases and write map/reduce jobs for the rarer use cases.