Php – Best file structure for storing 1000s of photos (in different sizes)

apache-2.2filesystemsPHPunix

I am developing a site that lets people upload images. We resize each image to 4 sizes. We're expecting lots and lots of images and are considering ways to increase performance with regards to the file structure, as we don't really want one directory with 10000s of files. Anyone got any suggestions as to how we organise the files?

The options that seem obvious are

each user has own folder, and within that a folder for each size

(Each of the four folders could hold a lot of images)

/user_uploads/user01/
                 |-/size_thumb/
                 |-/size_small/
                 |-/size_medium/
                 |-/size_large/
/user_uploads/user02/
                 |-/size_thumb/
                 |-/size_small/
                 |-/size_medium/
                 |-/size_large/

   etc etc

or
each users photos stored in one folder per user
(more photos per directory, but less overall directories)

/user_uploads/user01/
/user_uploads/user02/
  etc etc

each photo stored by size

lots and lots of photos per directory
(could have further subfolders by date?)

/user_uploads/small/
/user_uploads/medium/
/user_uploads/large/
/user_uploads/thumbs/

Anyone got any ideas? I think we'll probably go with /user_uploads/userID/ unless anyone has any suggestions.

(Right now everything will be hosted on one computer, so we don't have to worry about files being on different servers)

Best Answer

You might want to try md5 hashing of the image as it is uploaded, and then storing them in a directory structure like that below. Assuming 3 images which hash to:

  1. 2b00042f7481c7b056c4b410d28f33cf
  2. 84bdbf7c4d48e16642af4c317df428c2
  3. 7b2a7edc6e86224d6ba0f97b717c80ed

And a folder structure that looks like this:

/images/orig/2/2b/2b0/2b00042f7481c7b056c4b410d28f33cf.jpg
/images/orig/8/84/84b/84bdbf7c4d48e16642af4c317df428c2.jpg
/images/orig/7/7b/7b2/7b2a7edc6e86224d6ba0f97b717c80ed.jpg

/images/large/2/2b/2b0/2b00042f7481c7b056c4b410d28f33cf.jpg
/images/large/8/84/84b/84bdbf7c4d48e16642af4c317df428c2.jpg
/images/large/7/7b/7b2/7b2a7edc6e86224d6ba0f97b717c80ed.jpg

/images/small/2/2b/2b0/2b00042f7481c7b056c4b410d28f33cf.jpg
/images/small/8/84/84b/84bdbf7c4d48e16642af4c317df428c2.jpg
/images/small/7/7b/7b2/7b2a7edc6e86224d6ba0f97b717c80ed.jpg

You can make as many levels following the above pattern as you want to keep directory sizes manageable. Also if you prefer, you can use some user id to identify the images and still use a similar structure e.g. assuming user id of 14: (/images/orig/0/00/0014/0014.jpg)

You can store user -> image hash data in your database, while keeping your images on the filesystem. Regardless of the fact that it may be possible to store images inside a database, there are reasons you may not want to do so. Keeping them on the filesystem makes them much easier to move, say to a CDN, or into the cloud as you grow. It also allows you to put directories on different disk to increase read performance, if that's your thing.

The fact that you hash the original image to md5 means that if 30 people upload the exact same image, you will only keep one copy (in all sizes) of that image on your filesystem instead of 30 copies.