I have some experience with document based stores (MongoDB and CouchDB) and I am interested in exploring wide column databases.
Based on my initial exploration I can grasp a basic understanding of how wide column stores are different, but I do not really understand in which type of operations they are a better fit than an indexed document store.
My initial impression is that column stores are better if the column combinations for the queries are highly dynamic (no indexed view really required) and/or if there is a high rate of writing (that triggers map-reduce indexes in a document store).
Performance wise, it seems that column stores might be better if I have documents with many properties but not all of them are needed. Document stores seems to promote that the whole document will be retrieved, but not sure how much impact this really has. Maybe the document needs to have many filtered columns to make a difference?
Also I got the impression that column stores "might" be more performant for multi-tenant systems which shared database where one of the columns holds the tenant id and maybe another one the roles.
And I am getting the feeling that wide column stores are very good for the queries done by data analysis applications, where there is a large set of collected data for each entry, only few fields must be extracted and the combination of columns is totally random.
My Question: What types of queries are better handled in wide column stores as opposed to document stores?
Best Answer
I can't answer this question for you, and no one else can either, because "Gorilla vs Shark" as noted in comments above. But I will help anyway.
You have omitted an important preceding question:
That is just as important, if not more so, than the specific queries you want to run. Some useful questions to ask about your data are:
If you are considering this in the abstract and don't have any specific data set in mind, then there is no reasonable answer to your question.
And even with a specific, well-defined set of data, and answers to all these questions, you still might not know without doing a bakeoff of particular implementations.