On scaling up, the EBS volume and its data will not be "cloned". To have this behavior you'd want to automate it at boot.
- Grab the latest snapshot of WS-1 EBS volume
- Create and attach the volume
Another method, depending on how much data is on the EBS, is to pull it down from S3.
With the security group, you can allow any server in the app_security_group to have access to any server in the nfs_server_group. This will allow you to dynamically update the security groups.
Hope that makes sense.
Is GlusterFS a good solution to share data across all the instances in
the Autoscaling group?
Possibly.. The only way you'll get a definitive answer is with your own tests, however. In the past, I've set up a 4 node webserver cluster on Linode instances, using GlusterFS to distribute/share the assets directory of images and so on.
We found 2 main problems with this approach:
- GlusterFS is pretty IO intensive, and works really well on hardware with uncontended IO
- Occasionally, a Linode server would experience less-than-optimal access to the backend SAN, and IO-Wait time would go up dramatically. When this happened, Gluster would copy more data between the remaining nodes, which caused IO performance to suffer on those nodes in turn. The result of this was that a minor IO blip, caused by suboptimal SAN configuration, or timesharing would mean that the entire webserver cluster would go poot, and the entire shared filesystem might become unavailable.
Purely anecdotal evidence, but I'd not run GlusterFS on a virtual machine with SAN/shared storage ever again.
Does Gluster ensure there is no loss of data ?
It can... In Gluster 3.0, there's a better recognition of "replication pools" where you can define how many copies of the data exists throughout the cluster. Setting a replication level of 2, means that there's 2 copies on the entire cluster.. This effectively halves your storage capacity, but means that you've got greater resilience to node failure.
Importantly, it also means that you have to add more nodes as multiples of the replication level, in this case, pairs of nodes.
What will happen if all the instances in Autoscaling are terminated,
will I lose user data ?
If the instances are only using ephemeral instance storage, yes. If they're EBS based, or using mounted EBS instances, then no.
What happens if a user uploads a image and the server processing the
request goes down ?
That greatly depends on how your application is designed. I strongly suspect that the user would lose their data (almost certain in a naively architected solution.)
Is there an impact on IO if clients go down ?
See above.. If the client goes down because of backend storage problems, it can easily destroy the performance of the cluster entirely.
Best Answer
This can be achieved by using Amazon SDK ( I am almost done with it, will put it on github ), utilizing the SNS, EC2 and Autoscaling service.
I have followed the below steps to achieve this:
Please find the script here https://github.com/singhupendra/aws-autoscale