Whats the best filesystem for managing millions of images

filesystemsimagesreiserfsstoragezfs

I am designing a system capable of working with 15 million (and growing) image files ranging from 100k to 10mb. I am looking for some opinions on what may be the best filesystem to support the (somewhat) odd requirements:

Additional Information/Requirements:

The directory structure is certain non-optional [1] , but due to the design of the applications pulling this data, it is relatively immutable.
The data should be read optimized which includes, but may not be limited to: random reads, sequential reads, directory listings (some directories may have 30,000 directories or 1,000 images), etc.
Additional data will be written to the file structure (new sub directories, additional files in existing sub directories, etc) on a semi-regular basis, however write performance is not much a concern. Data will be written via SMB or NFS.
There is a significant number of identical files (conservative estimate is 20%), however due to the design of the application pulling this data, we can't delete the duplicate filenames. Ideally we would like some sort of deduplication (we could certainly hard link, but I am not sure how millions of hard links would scale)
SSDs will be the primary form of storage for this project (unless an argument can be made for spinners instead) so we would like to limit writes to the system where possible.

The hardware we have allocated for this project is as follows:

Dell R720xd w/ 24x 2.5” bays
RAM: 128GB RAM (more can be allocated if needed)
CPU: 2x E5-2620 @ 2.20GHz
Storage:
    8x2TB SSDs local storage
    1x500GB SSD for OS
RAID: H310 (IT Mode)

We were initially considering ZFS for this, but after some additional research it appears:

ZFS may thrash the SSDs when writing metadata updates.
ZFS has a high RAM requirement for deduplication (5GB RAM per 1TB of data). This should be doable on our current hardware though, it just seems like a lot of overhead.
RiserFS may be better suited for random lookup on small files (I can't seem to find what qualifies for a "small" file).

Any opinions on an optimal filesystem for this use case as well as any hardware tweaks would be much appreciated.

[1]

Example directory structure (none of the directories or filenames are normalized (sequential, etc) in any way)

+ root directory 1
    - sub directory 1
        - image 1
        - image 2
        - image 3
        - ...
        - image n (where n is between 1 and 1,000+)
    - sub directory 2
        - image 1
        - image 2
        - image 3
        - ...
        - image n
    ....
    - sub directory n (where n is between 1,000 and 30,000)
        - image 1
        - image 2
        - image 3
        - ...
        - image n
+ root directory 2
+ ...
+ root directory 15

Best Answer

Any filesystem (including lowly ext4 and slightly-less-lowly XFS) can meet the requirements you’ve listed, which are basically the ability to store lots of files and reasonable performance in a wide variety of use cases. My knowledge (and the interesting trade offs in this answer) is mainly about ZFS, so I’ll focus on that.

The additional abilities you would get from ZFS are:

Dedup. As you said, this is not super wonderful in ZFS because it has a heavy RAM requirement, but it does work. To get something similar on non-ZFS, you could hash your files and use the hashes as filenames / directory names, or keep a database of hash -> file name so you can make hard links. (In any of those cases you’d need to have exactly the same files, not just images that look the same).
Compression. Most images are already compressed so this might not buy you much, but if they’re RAW instead of JPEG, this could be a big savings. If not, this won’t buy you much.
Ability to snapshot / back up. ZFS has great built-in tools for this. You can back up non-ZFS too, although it might be hard to get a consistent snapshot of your data. LVM can do some of this, although arguably not as well.
Volume management is a part of ZFS. You can choose from a set of very flexible RAID configurations to get the optimal configuration of [data redundancy, space usage, performance] for your particular application. You can get some of this from LVM and other software RAID, but I believe ZFS has one of the best-designed solutions for volume management out there, combined with a well-designed system for failure detection and recovery.

Two other things you mentioned:

Thrashing metadata. I don’t think ZFS would be worse than other filesystems: it does update a fair amount of metadata during writes, but it’s copy on write and it does those updates in batches every 5-10 seconds, which means that large contiguous writes are happening instead of small in-place writes that require NAND blocks to be erased and rewritten many times. In a traditional filesystem you’ll end up with the other way because it will do in-place updates, which is probably slightly worse. At any rate, modern SSDs have a lot of extra blocks internally that they reserve to extend the life of the drive in the presence of wear — normal drive lifetimes are considered comparable to disk lifetimes. I’m not saying it doesn’t matter, I just don’t think you should fixate too much on this aspect since it’s pretty minor.
Hard link scalability. Should scale as well or better than normal files (in ZFS or not). Either way, a hard link is just a pointer to the same inode as some other file, and you’ll probably get a very small cache efficiency win since reading that file from one of the links will make it cached for accesses through the other links too.

Related Solutions

Storing a million images in the filesystem

I'd recommend using a regular file system instead of databases. Using file system is easier than a database, you can use normal tools to access files, file systems are designed for this kind of usage etc. NTFS should work just fine as a storage system.

Do not store the actual path to database. Better to store the image's sequence number to database and have function that can generate path from the sequence number. e.g:

 File path = generatePathFromSequenceNumber(sequenceNumber);

It is easier to handle if you need to change directory structure some how. Maybe you need to move the images to different location, maybe you run out of space and you start storing some of the images on the disk A and some on the disk B etc. It is easier to change one function than to change paths in database.

I would use this kind of algorithm for generating the directory structure:

First pad you sequence number with leading zeroes until you have at least 12 digit string. This is the name for your file. You may want to add a suffix:
- 12345 -> 000000012345.jpg
Then split the string to 2 or 3 character blocks where each block denotes a directory level. Have a fixed number of directory levels (for example 3):
- 000000012345 -> 000/000/012
Store the file to under generated directory:
- Thus the full path and file filename for file with sequence id 123 is 000/000/012/00000000012345.jpg
- For file with sequence id 12345678901234 the path would be 123/456/789/12345678901234.jpg

Some things to consider about directory structures and file storage:

Above algorithm gives you a system where every leaf directory has maximum of 1000 files (if you have less that total of 1 000 000 000 000 files)
There may be limits how many files and subdirectories a directory can contain, for example ext3 files system on Linux has a limit of 31998 sub-directories per one directory.
Normal tools (WinZip, Windows Explorer, command line, bash shell, etc.) may not work very well if you have large number of files per directory (> 1000)
Directory structure itself will take some disk space, so you'll do not want too many directories.
With above structure you can always find the correct path for the image file by just looking at the filename, if you happen to mess up your directory structures.
If you need to access files from several machines, consider sharing the files via a network file system.
The above directory structure will not work if you delete a lot of files. It leaves "holes" in directory structure. But since you are not deleting any files it should be ok.

Linux – Maximum number of files in one ext3 directory while still getting acceptable performance

Provided you have a distro that supports the dir_index capability then you can easily have 200,000 files in a single directory. I'd keep it at about 25,000 though, just to be safe. Without dir_index, try to keep it at 5,000.

Best Answer

Related Solutions

Storing a million images in the filesystem

Linux – Maximum number of files in one ext3 directory while still getting acceptable performance

Related Topic