If you're willing to go for adjacency lists, then why doesn't a simple foreign key to the parent of each post do the trick for you?
At any rate, stay away from nested sets for situations with many insertions. Anything you do to try to keep it efficient will make things complex and lose any advantage that the elegant trick might have had.
And for God's sake, don't go file-based - you'll just end up reinventing the database yourself, poorly.
Does using a NoSQL database give a boost to scalability even if you aren't sharding data? Well lets define scalability. If you are referring to scalability as database/backend systems are concerned, in that you have vertical and horizontal scaling where horizontal scaling IS sharding data then this becomes a trivial question because then the answer would be absolutely no, because the only option you have left is vertical scaling (ie getting better hardware). If however you are talking about scalability in a broader sense referring to flexibility of the application, data value, etc... Then that is a completely different question with a number of answers. And like you mentioned it will often come down to what are you doing with the data and how it should be stored. Let me preface everything here with the statement that in most cases you should still be using a RDBMS and NoSQL should fill niche's. The following is a description of a specific instance where a NoSQL database would be more beneficial given specific requirements, and where we can ignore horizontal scaling.
Take for instance the idea that you are creating a cloud file storage system similar to google drive, dropbox, or box but instead of using an actual file system you decide that it would be more beneficial to you to virtualize the file system. Now you have a problem because your data model is suddenly the tree structure that is going to be horribly inefficient in a RDBMS (despite the fact that that is how everything is indexed). Because now you have a 3 column table with Name, User, and Parent. User is a foreign key to a users table and Parent is a self referencing nullable foreign key (nullable because the root directory could not have a parent). So what is the primary key? In this instance it is a compounded key across all columns... Which suddenly makes Parent our worst enemy.
Now instead think about how you would put that in some form of document store? Instead of fighting the data you are able to work with it and store it as the tree structure which will in turn decrease your development time as well as decrease maintenance costs. If you are decreasing costs doesn't that allow for a different kind of scalability? Plus in this instance you are creating the system correctly from the ground up which should give more flexibility to the application itself. Currently I am running this on a single server using MongoDB, which as you explained gives me an Available, Consistent model that is not much different than looking at the difference of MySQL or Postgres.
With MongoDB at least you can define how many servers you need to communicate with for a query to succeed so, yes you can convert it to a Consistent, Available model if you tell all queries to communicate with all server instances.
So I think that you have the right of it in that there is a big benefit in how the data is stored. There are things that don't fit well in a relational model that fit well in other models (as another brief example, Amazon uses some form of Graph Database for their recommendation engine for products).
Did I correctly understand your question?
Edit:
Will more data slow things down? Yes. How much will it slow things down? I honestly don't have enough experience to give an adequate answer.
Key/Value: Essentially a lookup table with large amounts of data associated with the lookup key. This is going to be really really fast because you can only look things up by the key.
Column/Family: Essentially a much more structured Key/Value store. You can only query based on the Column and so this should be really fast too.
Document: Aggregation style schema. Here you will want to aggregate similar data together. Denormalization is ok and expected for this kind of database. Depending on whether you are doing a lot of writes or reads you can organize your data so that it gets distributed across multiple shards to distribute the writes or the reads out (note that you can create a hybrid approach that is good for both but generally you need to choose optimization for one or the other)
Graph: The strength of this one is that it can create and tear down relationships really quickly. If you have some data where you have relationships that need to change between data (think some form of recommendation engine) then you should use this.
How you store data in any of these databases will influence performance (similar to the fact that if you store data incorrectly in some RDBMS it will influence performance). So to hopefully make this more clear: You need to know which database system you should use as well as how to store data in that database system.
Best Answer
I don't know enough about your system but you need to look at the following:
1-How you obtain the data and in what format? Answering this will give options of how to store it and load it initially if you will end-up using a database.
2-How do you process this raw data? Answering this, will help you figure the 'active' set size. This will help in deciding how to store and how to load the data also. You may find that you don't need the entire input record and all you need is few fields of it only. If most of the fields are not used, you can keep them in a separate archived storage.
3-How do you inquire this data (online/batch and what criteria is most likely to be used)? Answering this will be the key factor in answering how to store the data what parts to keep on-line and what parts to keep off-line. Oracle for example allows you to run SQL on text files without loading the files in first. This could be a huge time saver, but of course it depends on your scenario.
as per your point:
I really don't understand how this is possible. If it is accurate, I am not sure how it will be used. Maybe you need to separate the concepts of mere data storage from the concept of which parts of the data will be used. If you understand more about how the data will be used, you may be able to cut down the number of rows by aggregation or a similar technique.
In short much analysis is required for before a solution can be found. The guiding principles are:
1-Know your data well
2-Cut down on row size by keeping the needed columns only and linking to off-line storage when possible
3-Cut down on total row numbers by aggregation when possible
4-Use table partitioning and avoid excessive indexing
5-Know how the users need to use this data
6-Consider loading data as it arrives
7-You are probably going to need a star schema (fact and dimensions) to speed queries, but we can't tell by just the information provided