How to track record relations in NoSQL

database-relationsnosql

I am trying to figure out the equivalent of foreign keys and indexes in NoSQL KVP or Document databases. Since there are no pivotal tables (to add keys marking a relation between two objects) I am really stumped as to how you would be able to retrieve data in a way that would be useful for normal web pages.

Say I have a user, and this user leaves many comments all over the site. The only way I can think of to keep track of that users comments is to

Embed them in the user object (which seems quite useless)
Create and maintain a user_id:comments value that contains a list of each comment's key [comment:34, comment:197, etc…] so that that I can fetch them as needed.

However, taking the second example you will soon hit a brick wall when you use it for tracking other things like a key called "active_comments" which might contain 30 million ids in it making it cost a TON to query each page just to know some recent active comments. It also would be very prone to race-conditions as many pages might try to update it at the same time.

How can I track relations like the following in a NoSQL database?

All of a user's comments
All active comments
All posts tagged with [keyword]
All students in a club – or all clubs a student is in

Or am I thinking about this incorrectly?

Best Answer

All the answers for how to store many-to-many associations in the "NoSQL way" reduce to the same thing: storing data redundantly.

In NoSQL, you don't design your database based on the relationships between data entities. You design your database based on the queries you will run against it. Use the same criteria you would use to denormalize a relational database: if it's more important for data to have cohesion (think of values in a comma-separated list instead of a normalized table), then do it that way.

But this inevitably optimizes for one type of query (e.g. comments by any user for a given article) at the expense of other types of queries (comments for any article by a given user). If your application has the need for both types of queries to be equally optimized, you should not denormalize. And likewise, you should not use a NoSQL solution if you need to use the data in a relational way.

There is a risk with denormalization and redundancy that redundant sets of data will get out of sync with one another. This is called an anomaly. When you use a normalized relational database, the RDBMS can prevent anomalies. In a denormalized database or in NoSQL, it becomes your responsibility to write application code to prevent anomalies.

One might think that it'd be great for a NoSQL database to do the hard work of preventing anomalies for you. There is a paradigm that can do this -- the relational paradigm.

Related Solutions

Mongodb – How to query for “is not null” in Mongo

This will return all documents with a key called "IMAGE URL", but they may still have a null value.

db.mycollection.find({"IMAGE URL":{$exists:true}});

This will return all documents with both a key called "IMAGE URL" and a non-null value.

db.mycollection.find({"IMAGE URL":{$ne:null}});

Also, according to the docs, $exists currently can't use an index, but $ne can.

Edit: Adding some examples due to interest in this answer

Given these inserts:

db.test.insert({"num":1, "check":"check value"});
db.test.insert({"num":2, "check":null});
db.test.insert({"num":3});

This will return all three documents:

db.test.find();

This will return the first and second documents only:

db.test.find({"check":{$exists:true}});

This will return the first document only:

db.test.find({"check":{$ne:null}});

This will return the second and third documents only:

db.test.find({"check":null})

Mongodb – How does column-oriented NoSQL differ from document-oriented

The main difference is that document stores (e.g. MongoDB and CouchDB) allow arbitrarily complex documents, i.e. subdocuments within subdocuments, lists with documents, etc. whereas column stores (e.g. Cassandra and HBase) only allow a fixed format, e.g. strict one-level or two-level dictionaries.

Best Answer

Related Solutions

Mongodb – How to query for “is not null” in Mongo

Mongodb – How does column-oriented NoSQL differ from document-oriented

Related Topic