MogileFS/GlusterFS/etc + Amazon EBS + Amazon EC2

amazon s3amazon-ebsfilesystemsglusterfs

I have a web application that serves binary files (images, etc). Our application runs on Amazon EC2. We were originally going to use Amazon S3 to store and serve these files, this is no longer an option.

We need to transfer these files over HTTPS using a CNAME. This is obviously impossible with Amazon S3 for many technical reasons. Amazon offers Elastic Block Storage (EBS) which allow you to mount a block of up to 1TB in size to one instance. We will have multiple instances accessing this data in parallel.

What I was thinking is using a distributed file system like MogileFS/GluserFS/[insert-more-here] with Elastic Block Storage (EBS).

So my question is: What are others currently doing to create a scalable (a few 100TBs) file storage system over Amazon EC2 without using Amazon S3 thats redundant? Data will still be backed up on Amazon S3 but all reads would be off the file system.

Thanks in advanced. If anyone needs clarification on anything please feel free to ask.

Best Answer

In Azouk (formerly linked domain dormant/parked) we don't use Amazon EC2, but we use GlusterFS (1.4.0qa92) for serving all content like PDFs, user files, thumbnails and also for offline data analysis. IMHO there should be no problem deploying same architecture on Amazon's cloud — we already heavily use virtualization (OpenVZ in particular). The only potential constraint is mounting GFS via fuse (virtualization could forbid this), but AFAIK it's possible on Amazon.

So, I recommend Gluster and sorry I can't help specifically with Amazon :)