Nfs – Replicated shared filesystem

I'm looking into setting up a shared filesystem/file server on AWS (EC2) infrastructure that offers replication and fairly painless failover. This filesystem would host potentially millions of files that are a few megs in size. Those files would be accessed (read/write) from several client VMs. If the primary file server fails I'd want the clients to be able to failover to the replica file server without losing any files (i.e. I want replication to be real-time). I've looked at a few options:

Use S3 with s3fs. I'm concerned the latency of each request will be problematic when performing operations on thousands of files (e.g. when copying/moving files around). I've also heard some reports that make me question s3fs's stability—not sure if that's still the case.
Setup an NFS server on an EC2 instance, using drbd to replicate blocks between two instances. Downsides:
1. I've had reliability issues with drbd in the past, especially over high-latency links
2. If the primary NFS server goes down it will take down the clients with it, requiring sysadmin intervention and/or reboots to get them to reconnect to the secondary server. There's no auto-failover.

Are there any better solutions?

Best Answer

Just some updated information. If you are like me and you have wanted this functionality for a VERY, VERY long time, use Amazon Elastic File System (EFS). It is an NFS mount replicated across multiple availability zones.

(Sorry to bump the issue, but the google rank of this answer is high enough that a few people probably are searching for this solution.)

Best Answer

Related Solutions

Linux – mirrored filesystem across a few servers

Linux – Heartbeat/DRBD failover didn’t work as expected. How to make the failover more robust

Related Topic