File System with Real-Time Data Deduplication

deduplicationfilesfilesystemsversioning

Is there a file system that stores files under a hash so there are no duplicates? It can be under any operating system. I know Git does that, but I'm looking for something that can run in real-time.

Best Answer

ZFS does this, but it is not a file-level deduplication. It's two better: block level deduplication (the intermediary between block and file deduplication being byte deduplication).

On Linux, there is SDFS; however ZFS has some better features like the ability to use a solid state drive as a hash table store so you're not eating up enormous amounts of RAM for your hash table. ZFS calls this L2ARC.

As of the writing of this post, please do not use ZFS on Linux. It needs to stay in the oven for a few more years. Use a BSD for ZFS.