Database Schema Design for Unlimited Hierarchical Data

database-design

I am trying to set up some sort of collaboration app/forum with unlimited threading; no difference between posts, threads, or forums. Any post can be answered an unlimited amount of time, and the same post can have several children, themselves having several children, and so on. Each post can also have several parents.
The final result would look more like a mind-map than a forum, although usage as a regular forum would also be possible.

How I see people using it: say a team discusses the making of a project website. After a few threads, the conversation splits in parallel conversations about coding & design. Then after a while, the two end nodes get linked. The conversation continues for a while…A third, unrelated conversation about bauhaus-style websites gets linked too, for reference…etc.

I am trying right now to build a proof of concept, and I am at loss at what database design to choose.

As I understand it, there is two possible patterns: adjacency list model, and nested set model. And as I understand it, the nested set is usually preferred. I have built an app based the on nested set model, and I know why it is more useful, in most cases.

Something bothers me here though: since I intend to have a lot of users adding a lot of leafs on a lot of different nodes, and since the N.S.M. needs to move left or right a whole bunch of leaves each time a sibling is added, what would be my best database schema?
I do not intend to release a fully optimized application yet, but I would still like to start on the right foot.

Here are a few of my thoughts, I would like opinions on them:

  • Leave a lot of space between nodes, so parents nodes would be numbered 0, 10000, 20000, etc. This is not an elegant solution, but might work
  • Go back to the adjacency list model, and use a node_node table to link nodes. Less efficient in tree retrieval, but more efficient in adding/deleting nodes
  • Ditch databases completely, and go back to file-based, with a mix of directories/xml to store data. I have never worked with filesystems to retrieve and search large amounts of data, I don't know how that would work out
  • Is there any DB schema more adapted to my case that you know of?

Thanks in advance

Best Answer

If you're willing to go for adjacency lists, then why doesn't a simple foreign key to the parent of each post do the trick for you?

At any rate, stay away from nested sets for situations with many insertions. Anything you do to try to keep it efficient will make things complex and lose any advantage that the elegant trick might have had.

And for God's sake, don't go file-based - you'll just end up reinventing the database yourself, poorly.

Related Topic