I am working on two projects that will both implement a Webdav server backed by a MongoDB GridFS. In each case, there is the potential for the system to store tens of millions of files spread across thousands of hierarchical directories.
I can come up with two different ways of storing the directory structure:
-
As a "true" hierarchical file system, with directories containing the IDs (
_id
) of subdirectories and regular files. The paths will be separated by slashes (/
) as in a POSIX-compliant file system. -
The path
/a/b/c
will be represented as a directorya
containing a directoryb
containing a filec
. -
As a flat file system, where file names include the slashes.
-
The path
/a/b/c
will be stored as a single file with the name/a/b/c
What are the advantages and disadvantages of each, with respect to a "real" folder-based file system?
Best Answer
Have you looked at http://www.mongodb.org/display/DOCS/Trees+in+MongoDB ? It looks like you're between "Child Links" and "Materialized Paths". Based on the commentary it seems that your second idea is a much better fit for Mongo. Storing each subdirectory
_id
is a little too relational and implies linking and joins which are not things that Mongo excels at.