Database – Is the use of NoSQL Databases impractical for large datasets where you need to search by content

databasedatabase-designnosqlrelational-databasesql

I've been learning about NoSQL Databases for a week now.

I really understand the advantages of NoSQL Databases and the many use cases they are great for.

But often people write their articles as if NoSQL could replace Relational Databases. And there is the point I can't get my head around:

NoSQL Databases are (often) key-value stores.

Of course it is possible to store everything into a key-value store (by encoding the data in JSON, XML, whatever), but the problem I see is that you need to get some amount of data that matches a specific criterion, in many use cases. In a NoSQL database you have only one criterion you can search for effectively – the key. Relational Databases are optimized to search for any value in the data row effectively.

So NoSQL Databases are not really a choice for persisting data that need to be searched by their content. Or have I misunderstood something?

An example:

You need to store user data for a webshop.

In a relational database you store every user as an row in the users table, with an ID, the name, his country, etc.

In a NoSQL Database you would store each user with his ID as key and all his data (encoded in JSON, etc.) as value.

So if you need to get all users from a specific country (for some reason the marketing guys need to know something about them), it's easy to do so in the Relational Database, but not very effective in the NoSQL Database, because you have to get every user, parse all the data and filter.

I don't say it's impossible, but it gets a lot more tricky and I guess not that effective if you want to search in the data of NoSQL entries.

You could create a key for each country that stores the keys of every user who lives in this country, and get the users of a specific country by getting all the keys which are deposited in the key for this country. But I think this techique makes a complex dataset even more complex – it's harder to implement and not as effective as querying an SQL Database. So I think it's not a way you would use in production. Or is it?

I'm not really sure if I misunderstood something or overlooked some concepts or best practices to handle such use cases. Maybe you could correct my statements and answer my questions.

Best Answer

While I agree with your premise that NoSQL is not a panacea for all database woes, I think you misunderstand one key point.

In NoSQL database you have only one criterion you can search for effectively - the key.

This is clearly not true.

For example MongoDB supports indices. (from https://docs.mongodb.org/v3.0/core/indexes-introduction/)

Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.

Indexes are special data structures [1] that store a small portion of the collection’s data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field. The ordering of the index entries supports efficient equality matches and range-based query operations. In addition, MongoDB can return sorted results by using the ordering in the index.

As does couchbase (from http://docs.couchbase.com/admin/admin/Views/views-intro.html)

Couchbase views enable indexing and querying of data.

A view creates an index on the data according to the defined format and structure. The view consists of specific fields and information extracted from the objects in Couchbase.

In fact anything that calls itself a NoSQL database rather than a key-value store should realy support some kind of indexing schemes.

In fact, it is often the flexibility of these index schemes that makes NoSQL shine. In my opinion, the language used to define the NoSQL indices are often more expressive or natural than SQL, and since they usually live outside the table, you don't need to change your table schemas to support them. (Not to say you cant do similar things in SQL but to me it feels like there is a lot more hoop-jumping involved).

Related Topic