R – Do any common OS file systems use hashes to avoid storing the same content data more than once

filesystemshashoperating system

Many file storage systems use hashes to avoid duplication of the same file content data (among other reasons), e.g., Git and Dropbox both use SHA256. The file names and dates can be different, but as long as the content gets the same hash generated, it never gets stored more than once.

It seems this would be a sensible thing to do in a OS file system in order to save space. Are there any file systems for Windows or *nix that do this, or is there a good reason why none of them do?

This would, for the most part, eliminate the need for duplicate file finder utilities, because at that point the only space you would be saving would be for the file entry in the file system, which for most users is not enough to matter.

Edit: Arguably this could go on serverfault, but I feel developers are more likely to understand the issues and trade-offs involved.

Best Answer

ZFS supports deduplication since last month: http://blogs.oracle.com/bonwick/en_US/entry/zfs_dedup

Though I wouldn't call this a "common" filesystem (afaik, it is currently only supported by *BSD), it is definitely one worth looking at.

Related Topic