Database – When should we use MongoDB

databasemongodbnosql

MongoDB is a NoSQL database which I've found quite easy to use. Recently I had to develop a simple application which needed to collect some data using HTTP requests and store some results after processing the data, and I tried using MongoDB.

From this experience I found it much nicer to use than traditional relational databases and since I'm a developer, and not a DBA, my work was greatly simplified.

Still, sometimes I feel unsure when should I use MongoDB instead of a traditional relational database, like SQL Server or MySQL.

In that case, when we can use MongoDB instead of relational databases? Is there some trully big caveat about MongoDB that makes it improper for some situations?

Best Answer

Basically:

  • If you can represent your data in a form of a bunch of documents, MongoDB could be a good choice.

  • If you would rather imagine your data as a bunch of interconnected tables, MongoDB may not be a good choice.

Here are two examples which I find illustrative:

  • A few years ago, I created a blog engine. Its purpose is to host blog articles, and for every article, store the different versions, some metadata, visit statistics, etc.

    This could be stored as a bunch of tables, but when trying to build a model, it grows very fast to a dozen of tables, if not more. Some SQL queries could get ugly with a lot of joins, and... well, you get the picture.

    The problem here is that there is a central thing—a blog article—and there is all this stuff around the article, which makes it well suited for a document-based database. With MongoDB, modeling the database was extremely easy: one collection holds the blog articles, and a second tiny collection contains the list of users allowed to write articles. Each document within the first collection would contain all the information I need when displaying an article, would it be the name of the author, or the tags.

  • Now imagine a very different project. There are some users who can write stuff, and share the stuff written by other users. On a page of a user, you would expect to find both things this user wrote and the ones she shared. There is one constraint: when somebody edits what he wrote in the past, the change appears everywhere where the original text was shared.

    With a document-based approach, it's difficult to find what would be the document. A user maybe? Well, that's a good start. A user document would contain all the things this user wrote. But what about the things she shared?

    A possible way is to put those things in the same document. The problem with this approach is that if somebody edits an entry, the application should walk through every user document in the database in order to edit every occurrence of the old entry. Not counting the data duplication.

    An alternative would be to keep within the user document just the list of entries this user shared (with the ID of the referred user and entry). But now, a different problem would occur: if a user shared thousands of entries from thousands of users, it would require to open thousands of documents to get those entries.

    Or we can model our collection around the entries themselves, each entry referring to its author and having a list of users who shared it. Here again, performance issues could become noticeable when you'll need to walk through all the documents in order to show the ones published by a given user.

    Now, how much tables would you need if you were using a relational database? Right, three. It would be straightforward to model, and also straightforward to use.

Related Topic