I basically have one big gigantic table (about 1.000.000.000.000 records) in a database with these fields:
id, block_id, record
id is unique, block_id is not unique, it contains about 10k (max) records with the same block_id but with different records
To simplify my job that deals with the DB I have an API similar to this:
Engine e = new Engine(...);
// this method must be thread safe but with fine grained locked (block_id) to improve concurrency
e.add(block_id, "asdf"); // asdf up to 1 Kilobyte max
// this must concatenate all the already added records added block_id, and won't need to be bigger than 10Mb (worst case) average will be <5Mb
String s = e.getConcatenatedRecords(block_id);
If I map each block to a file(haven't done it yet), then each record will be a line in the file and I will still be able to use that API
But I want to know if I will have any peformance gain by using flat files compared to a well tunned postgresql database ? (at least for this specific scenario)
My biggest requirement though is that the getConcatenatedRecords method returns stupidly fast (not so with the add operation). I am considering caching and memory mapping also, I just don't want to complicate myself before asking if there is an already made solution for this kind of scenario ?
Best Answer
It seems like your "storage system" whatever that is has a very simple abstraction interface. It essentially boils down to "here's an id, gimme data".
So you can easily define this interface and build your entire app on top of it. Behind the scenes you can continue using PostgreSQL like you do today. And if you want to experiment with flat file storage, it should take you more than 1 or 2 days to implement something very simple that reads/writes files straight on disk (my recommendation is to have 1-3 tiers of directories based on first portion of the ID, so you don't have too many files in one flat directory).
If you do that you can compare the performance straight up and see if it's good enough for you.
However, as Euphoric pointed out, most NOSQL stores where introduced and became popular for the very purpose that you are trying to accomplish. I'm not going to recommend a specific store as that's something for you to decide but some advantages they provide are:
Another option to consider is that potentially your storage and your indexing does not have to be implemented in the same system. You could use a separate indexing product like Solr or Elasticsearch and store the actual data in a NoSQL DB (or a straight-up file system)