What’s the best way to store thousands of images in a windows folder structure

directoryimagesperformance

We have hundreds of thousands of jpg images in a windows folder structure like this but it's really hard to interact and work with them in a snappy way (listing takes time, copying takes time, etc). Here's the structure:

images/
  1/
    10001/
      10001-a.jpg
      10001-b.jpg
      ...
      10001-j.jpg (10 images in each XXXXX folder)
    10002/
    10003/
    ...
    19999/
  2/
    20001/
    20002/
    20003/
    ...
    29999/
  3/
  4/
  5/
  6/
  7/
  8/
  9/

Now, browsing these images is a little bit slow because there are appr. 10 000 folders in each X folder and listing those simply takes time.

Is there a better way to organize the images with less subfolders/items? Would changing the structure to this have any effect?

images/
  1/
    0/
      0/
        0/
          0/
          1/
          2/
          3/
          4/
          5/
          6/
          7/
          8/
          9/
          10000/ (image folder, same as path)
            10000-a.jpg
            10000-b.jpg
            ...
            10000-j.jpg (10 images in each image folder)
        1/
        2/
        3/
        4/
        5/
        6/
        7/
        8/
        9/
      1/
      2/
      3/
      4/
      5/
      6/
      7/
      8/
      9/
    1/
    2/
    3/
    4/
    5/
    6/
    7/
    8/
    9/
  2/
  3/
  4/
  5/
  6/
  7/
  8/
  9/

Thus, locating image 48617-c.jpg would be equal to path 4/8/6/1/7/48617/48617-c.jpg.

The reason for having a separate folder with the full path number 48617 is to simplify copying of a complete 10-image batch (by copying the entire folder).

Now… no folder will have more than 11 immediate subfolders but there will be lots of extra single digit folders for separation purposes. Would this setup speed up browsing and interaction having multiple users adding/copying/deleting/etc images?

Best Answer

Windows is a bit special when it comes to folder layout with kajillions of files. Especially images, since Windows Explorer treats them special. That said, there are a few guide-lines to follow to keep things from getting too out of hand:

  • If you intend to browse the directory structure from Windows Explorer for any reason, keep it under 10,000 entries in a directory (files & sub-directories).
  • If you will be interacting with it solely from cli utilities or code the 10K limit is far more flexible.
  • Don't create TOO many sub-directories, each directory you create is another discrete operation a copy has to make when copying.
    • If each file creates N directories, the number of file-system objects created by that file will be 1+N, which linearly scales your copy-times.
    • A short, exponential tree (i.e. three tiers of directories, each with 256 sub-directories) can scale amazingly far before you run into the 10K/per-directory limit.
  • If you're accessing it with code, go for direct opens instead of parsing directory-listings prior to open. A failed fopen() followed by a directory-scan is faster than a dir-scan followed by a guaranteed fopen() in many cases.

Caveats:

  • File-count is immutable, but directory count is up to you. The SUM of those two counts impacts how fast copy operations take.
  • Try, if at all possible, to not browse with Windows Explorer unless you have to. It doesn't deal well with big directories, and there isn't much you can do about it.