There have been some great answers to this already, but I wanted to add some more recent CouchDB features to the mix of options for working with the original situation described by viatropos.
The key point at which to split up documents is where there might be conflicts (as mentioned earlier). You should never keep massively "tangled" documents together in a single document as you'll get a single revision path for completely unrelated updates (comment addition adding a revision to the entire site document for instance). Managing the relationships or connections between various, smaller documents can be confusing at first, but CouchDB provides several options for combining disparate pieces into single responses.
The first big one is view collation. When you emit key/value pairs into the results of a map/reduce query, the keys are sorted based on UTF-8 collation ("a" comes before "b"). You can also output complex keys from your map/reduce as JSON arrays: ["a", "b", "c"]
. Doing that would allow you to include a "tree" of sorts built out of array keys. Using your example above, we can output the post_id, then the type of thing we're referencing, then its ID (if needed). If we then output the id of the referenced document into an object in the value that's returned we can use the 'include_docs' query param to include those documents in the map/reduce output:
{"rows":[
{"key":["123412804910820", "post"], "value":null},
{"key":["123412804910820", "author", "Lance1231"], "value":{"_id":"Lance1231"}},
{"key":["123412804910820", "comment", "comment1"], "value":{"_id":"comment1"}},
{"key":["123412804910820", "comment", "comment2"], "value":{"_id":"comment2"}}
]}
Requesting that same view with '?include_docs=true' will add a 'doc' key that will either use the '_id' referenced in the 'value' object or if that isn't present in the 'value' object, it will use the '_id' of the document from which the row was emitted (in this case the 'post' document). Please note, these results would include an 'id' field referencing the source document from which the emit was made. I left it out for space and readability.
We can then use the 'start_key' and 'end_key' parameters to filter the results down to a single post's data:
?start_key=["123412804910820"]&end_key=["123412804910820", {}, {}]
Or even specifically extract the list for a certain type:
?start_key=["123412804910820", "comment"]&end_key=["123412804910820", "comment", {}]
These query param combinations are possible because an empty object ("
{}
") is always at the bottom of the collation and null or "" are always at the top.
The second helpful addition from CouchDB in these situations is the _list function. This would allow you to run the above results through a templating system of some kind (if you want HTML, XML, CSV or whatever back), or output a unified JSON structure if you want to be able to request an entire post's content (including author and comment data) with a single request and returned as a single JSON document that matches what your client-side/UI code needs. Doing that would allow you to request the post's unified output document this way:
/db/_design/app/_list/posts/unified??start_key=["123412804910820"]&end_key=["123412804910820", {}, {}]&include_docs=true
Your _list function (in this case named "unified") would take the results of the view map/reduce (in this case named "posts") and run them through a JavaScript function that would send back the HTTP response in the content type you need (JSON, HTML, etc).
Combining these things, you can split up your documents at whatever level you find useful and "safe" for updates, conflicts, and replication, and then put them back together as needed when they're requested.
Hope that helps.
As suggested in the CouchDB definitive guide, you should put the values you want to be unique in the key, then query the reduce function with group=true
.
For example, given that keyfield
is the field with "key1" and "key2" and valuefield
is the field with the values, your map function could be:
function(doc) {
// filter to get only the interesting documents: change as needed
if (doc.keyfield && doc.valuefield) {
/*
* This is the important stuff:
*
* - by putting both, the key and the value, in the emitted key,
* you can filter out duplicates
* (simply group the results on the full key);
*
* - as a bonus, by emitting 1 as the value, you get the number
* of duplicates by using the `_sum` reduce function.
*/
emit([doc.keyfield, doc.valuefield], 1);
}
}
and your reduce function could be:
_sum
Then querying with group=true&startkey=["key2"]&endkey=["key2",{}]
gives:
{"rows":[
{"key":["key2","anotherval"],"value":1},
{"key":["key2","andanother"],"value":2}
]}
Best Answer
CouchDB Views do not support facetted search or fulltext search or result intersection. The couchdb-lucene plugin lets you do all these things.
http://github.com/rnewson/couchdb-lucene/tree/master