MongoDB – Easy Way to Map Directory Structure to a MongoDB Schema

javascriptmongonode.js

I'm trying to store a directory structure, including files and their content, in MongoDB. The work is part of a synching app, and is using in Node/Mongoose.

Now, I'm new to Mongo, and it's late here – is the idiomatic implementation as easy as it looks – ie, something like this?

var FileStoreSchema = new Schema({
  fullPath: {
    type: String,
    index: { unique: true }
  },
  filename: String,
  [Metadata and other useful fields]  
});

Best Answer

It depends largely on what operations you want to be able to efficiently support. Your current approach will make it easy to list all files within a directory and all descendants of a directory.

On the other hand moving files from one directory to another is more painful because there is no atomic modifier that allows you to modify a substring of a field. Instead, you'll have to pull down all of the file entries, modify them, then push them back as updates. (Someone correct me if I'm wrong about this)

With this structure it is also relatively difficult to list out the directory structure for traversal. The only way to know every folder is to load every file path and then parse them to see if they contain a new folder.

Does any of this matter? Maybe not, depends completely on what you are trying to do.

As another possible method, do you really want to represent one hierarchy? Or do you get a better representation have many smaller hierarchies? If the second, then you may get better results storing each hierarchy as a complete document, representing folder structure with the nesting of documents or via parent-child references.

Related Solutions

Implementing Scheduling in Node.js Applications

Obviously I don't want to cycle through the entire collection

It's not obvious to me. How big is this collection?

Anyway, why would you have to cycle through the entire collection? Put them in a list ordered by expiration time. Since time moves forward, you can discard everything that has expired. So you only have to check if the first element is current. When it expires, discard it and use the next one.

How to model hashtags with nodejs and mongodb

Store hashtags in an array within a document.

That's the benefit of having documents: you can simply nest them. And, in this particular case, it's trivial:

{
    "_id": 123,
    "file": "c43a5f46-kitten.png",
    "description": "My kitten :3 #kittens #cute"
    "hashtags": ["kittens", "cute", "cat", "animals"]
}

(I added some "synonymous" tags, this can be done automatically by looking up some other document.)

This is the most natural solution for document-oriented database:

Searching documents by hashtags is trivial if you just add an index, as well as inserting, updating, and deleting hashtags on random documents is also trivial
Massive inserting, updating, and deleting is a bit tricky, because you'd probably want to split such operations in multiple "batches", but still it's manageable and not hard to implement
Complex aggregations can be done with the standard aggregation pipeline or map-reduce

On the other hand, if you go with relational style, you'll be in a big trouble when you reinvent a SQL JOIN within your application code. This is one of the most common anti-patterns of using MongoDB (and such). Here's a very typical pseudocode:

for (HashTag tag: mongodb.hashtags.find()) {
   for (Image img: mongodb.images.find(
           new Document("_id", new tag.getImageId()))) {
       // ...
   }
}

This is inefficient, not scalable, and you are simply reinventing a wheel. Using this, you'll probably end up with complexity of O(N*M) because of loops within your code. If you'd choose SQL with foreign keys instead, you'd have something like O(N*log(M)) or even O(N+M).

There are no tables (relations) and foreign keys in MongoDB. Do not invent them, please. Use SQL instead, if you need. In fact, I highly suggest using SQL instead of MongoDB, unless your data really consists of documents.

Typical examples of documents are configurations, forms, and maybe user sessions. Those typically don't fit well into tables because of "random" structure.

Best Answer

Related Solutions

Implementing Scheduling in Node.js Applications

How to model hashtags with nodejs and mongodb

Related Topic