Git internally stores objects (Blobs, trees) in the .git/objects/
folder. Each object can be referenced by a SHA1 hash that is computed from the contents of the object.
However, Objects are not stored inside the .git/objects/
folder directly. Instead, each object is stored inside a folder that starts with the prefix of its SHA1 hash. So an object with the hash b7e23ec29af22b0b4e41da31e868d57226121c84
would be stored at .git/objects/b7/e23ec29af22b0b4e41da31e868d57226121c84
Why does Git subdivide its object storage this way?
The resources I could find, such as the page on Git's internals on git-scm, only only explained how, not why.
Best Answer
It is possible to put all the files in one directory, though sometimes that can become a bit large. Many file systems have a limit. You want to put a git repository on a FAT32 formatted drive on a USB stick? You can only store 65,535 files in a single directory. This means that it is necessary to subdivide the directory structure so that filling a single directory is less likely.
This would even become a problem with other file systems and larger git repositories. A relatively small git repo that I've got hanging out (about 360MiB) and it has 181,546 objects for 11k files. Pull the Linux repo and you've got 4,374,054 objects. If you were to put these all in one directory, it would be impossible to check out and would crash (for some meaning of 'crash') the file system.
So? You split it up by byte. Similar approaches are done with applications such as FireFox:
Beyond this, it also goes to a question of performance. Consider NTFS Performance with Numerous Long Filenames:
With files named after SHA1 checksums, this could be a recipe for disaster and abysmal performance.
While the above is from a tech note from Windows NT 3.5 (and NTFS 1.2 - commonly used from 1995 to the early 2000s) this can also be seen in things such as EXT3 with implementations of the filesystem being linked lists requiring O(n) lookup. And even with that B-tree change:
Incidentally, this bit on how to improve performance was from 2005, the same year git was released.
As seen with Firefox and many other applications that have lots of hash cached files, the design of splitting up the cache by byte. It has negligible performance cost, and when used cross platform with systems that may be a bit on the old side, could very well be the difference between the program working or not.