Database – Why Relational Databases Don’t Support Nested Format Queries

databasequeryrdbmssql

Suppose I'm building a blog that I want to have posts and comments. So I create two tables, a 'posts' table with an autoincrementing integer 'id' column, and a 'comments' table that has a foreign key 'post_id'.

Then I want to run what will probably be my most common query, which is to retrieve a post and all of its comments. Being rather new to relational databases, the approach that appears most obvious to me is to write a query that would look something like:

SELECT id, content, (SELECT * FROM comments WHERE post_id = 7) AS comments
FROM posts
WHERE id = 7

Which would give me the id and content of the post that I want, along with all the relevant comment rows packaged neatly in an array (a nested representation like you'd use in JSON). Of course, SQL and relational databases don't work like this, and the closest they can get is to do a join between 'posts' and 'comments' that will return a lot of unnecessary duplication of data (with the same post information repeated in every row), which means processing time is spent both on the database to put it all together and on my ORM to parse and undo it all.

Even if I instruct my ORM to eagerly load the post's comments, the best it'll do is to dispatch one query for the post, and then a second query to retrieve all of the comments, and then put them together client-side, which is also inefficient.

I understand that relational databases are proven technology (hell, they're older than I am), and that there's been a ton of research put into them over the decades, and I'm sure there's a really good reason why they (and the SQL standard) are designed to function the way they do, but I'm not sure why the approach I outlined above isn't possible. It seems to me to be the most simple and obvious way to implement one of the most basic relationships between records. Why don't relational databases offer something like this?

(Disclaimer: I mostly write webapps using Rails and NoSQL datastores, but recently I've been trying out Postgres, and I actually like it a lot. I don't mean to attack relational databases, I'm just perplexed.)

I'm not asking how to optimize a Rails app, or how to hack my way around this problem in a particular database. I'm asking why the SQL standard works this way when it seems counterintuitive and wasteful to me. There must be some historical reason why the original designers of SQL wanted their results to look like this.

Best Answer

C. J. Date goes into detail about this in Chapter 7 and Appendix B of SQL and Relational Theory. You're right, there's nothing in relational theory that prohibits an attribute's data type from being a relation itself, as long as it's the same relation type on every row. Your example would qualify.

But Date says structures like this are "usually--but not invariably--contraindicated" (i.e. a Bad Idea) because hierarchies of relations are asymmetric. For example, a transformation from nested structure to a familiar "flat" structure cannot always be reversed to recreate the nesting.

Queries, constraints, and updates are more complex, harder to write, and harder for the RDBMS to support if you allow relation-valued attributes (RVA's).

It also muddies database design principles, because the best hierarchy of relations isn't so clear. Should we design a relation of Suppliers with a nested RVA for parts supplied by a given Supplier? Or a relation of Parts with a nested RVA for suppliers who supply a given Part? Or store both, to make it easy to run different types of queries?

This is the same dilemma that results from the hierarchical database and the document-oriented database models. Eventually, the complexity and cost of accessing nested data structures drives designers to store data redundantly for easier lookup by different queries. The relational model discourages redundancy, so RVA's can work against the goals of relational modeling.

From what I understand (I have not used them), Rel and Dataphor are RDBMS projects that support relation-valued attributes.


Re comment from @dportas:

Structured types are part of SQL-99, and Oracle supports these. But they don't store multiple tuples in the nested table per row of the base table. The common example is an "address" attribute which appears to be a single column of the base table, but has further sub-columns for street, city, postal code, etc.

Nested tables are also supported by Oracle, and these do allow multiple tuples per row of the base table. But I am not aware that this is part of standard SQL. And keep in mind the conclusion of one blog: "I'll never use a nested table in a CREATE TABLE statement. You spend all of your time UN-NESTING them to make them useful again!"