Amazon AutoScaling and GlusterFS

amazon ec2amazon-web-servicesglusterfs

I have setup Elastic load balancing with 5 EC2 instance registered with the load balancer. To our website users upload their data(images), we store these images in network attached storage (NAS). We have the NAS mounted on all the instances.

We are planning a move to introduce Amazon AutoScaling and also move out of Network Attached storage.

  1. Is GlusterFS a good solution to share data across all the instances in the Autoscaling group?

  2. Does Gluster ensure there is no loss of data ?

  3. What will happen if all the instances in Autoscaling are terminated, will I lose user data ?

  4. What happens if a user uploads a image and the server processing the request goes down ?

  5. Is there an impact on IO if clients go down ? (What exactly does Gluster do?)

Best Answer

Is GlusterFS a good solution to share data across all the instances in the Autoscaling group?

Possibly.. The only way you'll get a definitive answer is with your own tests, however. In the past, I've set up a 4 node webserver cluster on Linode instances, using GlusterFS to distribute/share the assets directory of images and so on.
We found 2 main problems with this approach:

  1. GlusterFS is pretty IO intensive, and works really well on hardware with uncontended IO
  2. Occasionally, a Linode server would experience less-than-optimal access to the backend SAN, and IO-Wait time would go up dramatically. When this happened, Gluster would copy more data between the remaining nodes, which caused IO performance to suffer on those nodes in turn. The result of this was that a minor IO blip, caused by suboptimal SAN configuration, or timesharing would mean that the entire webserver cluster would go poot, and the entire shared filesystem might become unavailable.

Purely anecdotal evidence, but I'd not run GlusterFS on a virtual machine with SAN/shared storage ever again.

Does Gluster ensure there is no loss of data ?

It can... In Gluster 3.0, there's a better recognition of "replication pools" where you can define how many copies of the data exists throughout the cluster. Setting a replication level of 2, means that there's 2 copies on the entire cluster.. This effectively halves your storage capacity, but means that you've got greater resilience to node failure.
Importantly, it also means that you have to add more nodes as multiples of the replication level, in this case, pairs of nodes.

What will happen if all the instances in Autoscaling are terminated, will I lose user data ?

If the instances are only using ephemeral instance storage, yes. If they're EBS based, or using mounted EBS instances, then no.

What happens if a user uploads a image and the server processing the request goes down ?

That greatly depends on how your application is designed. I strongly suspect that the user would lose their data (almost certain in a naively architected solution.)

Is there an impact on IO if clients go down ?

See above.. If the client goes down because of backend storage problems, it can easily destroy the performance of the cluster entirely.

Related Topic