File Systems – Is It a Bad Idea to Sync File System with Remote Server Using HTTP?

development-approachfile-systemssynchronization

I have started a project which will duplicate Dropbox or Google Drive behavior but using Amazon S3 az a backend.

Idea is very simple, a Node.js server that watchs a directory for file changes and PUT them on the S3. Or it will look at S3 for changes and applies them to file system structure. I'be uploaded very early version of my app to Github. You can find it here.

Because I am a web developer, I am using web technologies to solve the problem. I'm afraid of my limited mindset and picking wrong tools for the job. There are other solutions to this problem. One is S3FS which is a FUSE file system for Unix systems. In my opinion that is very hard to use and limited to platform. My solution uses Node.js to overcome cross-platform issues. I can pack my Node.js app with App.js and make it an easy to use software.

To clarify, my questions are:

  • Is HTTP/HTTPS good enough for file transfer?
  • Is Node.js good enough for working with File System?
  • Scalability: can this approach fail in large file sizes?

Best Answer

Because I am a web developer, I am using web technologies to solve the problem. I'm afraid of my limited mindset and picking wrong tools for the job.

You're falling into the "when all you have is a hammer, everything looks like a nail" trap. But you're able to recognize it, which is a very good thing.

Is HTTP/HTTPS good enough for file transfer?

Of course. Billions, if not trillions, of files are transferred via HTTP daily. Any file you transfer to S3 is going to go that way.

Is Node.js good enough for working with File System?

I don't know a great deal about Node.js, but if your goal is to just have a standalone program that runs on a machine and keeps a directory and S3 in sync, it it doesn't seem like the right tool for the job.

If you're willing to learn Java, the Apache Commons Virtual File System can talk to a variety of backends (including the local file system), and VFS-S3 is an add-on that makes it work with S3. VFS includes a class called AbstractSyncTask that takes care of 90% of the grunt work, leaving you to just extend the class and implement your transfer rules. Being Java, it will run anywhere Java does.

Scalability: can this approach fail in large file sizes?

That would depend on the limits of Node.js, although I'd be surprised if someone hasn't already bumped into that problem and made sure it can transfer arbitrary-length files.