since you're striping data across the volumes, it would stand to reason that you have to put each NEW volume in the same location on the RAID as the volume from which the snapshot was created.
I tested your premise, and logical as it may seem, the observation is otherwise.
Let me detail this:
I have the exact same requirement as you do. However, the RAID0 that I am using has only 2 volumes.
I'm using Ubuntu 10 and have 2 EBS devices forming a RAID0 device formatted with XFS.
The raid0 device was creating using the following command:
sudo mdadm --create /dev/md0 --level 0 --metadata=1.1 --raid-devices 2 /dev/sdg /dev/sdh
I've installed MYSQL and a bunch of other software that are configured to use /dev/md0 to store their data files.
Using the same volumes:
Once done, I umount everything, stop the Raid and reassemble it like so:
sudo mdadm --assemble /dev/md0 /dev/sdh /dev/sdg
The thing is that irrespective of the order of /dev/sdg /dev/sgh
, the RAID reconstitutes itself correctly.
Using snapshots:
Post this, I use ec2-consistent-snapshot
to create snapshots of the 2 EBS disks together. I then create volumes from this disk, attach it to a new instance (that has been configured for the software already), reassemble the RAID (I've tried interchanging the order of the EBS volumes too), mount it and I'm ready to go.
Sounds strange, but it works.
An AMI, as you note, is a machine image. It's a total snapshot of a system stored as an image that can be launched as an instance. We'll get back to AMIs in a second.
Lets look at EBS. Your other two items are sub-items of this. EBS is a virtual block device. You can think of it as a hard drive, although it's really a bunch of software magic to link into another kind of storage device but make it look like a hard drive to an instance.
EBS is just the name for the whole service. Inside of EBS you have what are called volumes. These are the "unit" amazon is selling you. You create a volume and they allocate you X number of gigabytes and you use it like a hard drive that you can plug into any of your running computers (instances). Volumes can either be created blank or from a snapshot copy of previous volume, which brings us to the next topic.
Snapshots are ... well ... snapshots of volumes: an exact capture of what a volume looked like at a particular moment in time, including all its data. You could have a volume, attach it to your instance, fill it up with stuff, then snapshot it, but keep using it. The volume contents would keep changing as you used it as a file system but the snapshot would be frozen in time. You could create a new volume using this snapshot as a base. The new volume would look exactly like your first disk did when you took the snapshot. You could start using the new volume in place of the old one to roll-back your data, or maybe attach the same data set to a second machine. You can keep taking snapshots of volumes at any point in time. It's like a freeze-frame instance backup that can then easy be made into a new live disk (volume) whenever you need it.
So volumes can be based on new blank space or on a snapshot. Got that? Volumes can be attached and detached from any instances, but only connected to one instance at a time, just like the physical disk that they are a virtual abstraction of.
Now back to AMIs. These are tricky because there are two types. One creates an ephemeral instances where the root files system looks like a drive to the computer but actually sits in memory somewhere and vaporizes the minute it stops being used. The other kind is called an EBS backed instance. This means that when your instances loads up, it loads its root file system onto a new EBS volume, basically layering the EC2 virtual machine technology on top of their EBS technology. A regular EBS volume is something that sits next to EC2 and can be attached, but an EBS backed instance also IS a volume itself.
A regular AMI is just a big chunk of data that gets loaded up as a machine. An EBS backed AMI will get loaded up onto an EBS volume, so you can shut it down and it will start back up from where you left off just like a real disk would.
Now put it all together. If an instance is EBS backed, you can also snapshot it. Basically this does exactly what a regular snapshot would ... a freeze frame of the root disk of your computer at a moment in time. In practice, it does two things different. One is it shuts down your instance so that you get a copy of the disk as it would look to an OFF computer, not an ON one. This makes it easier to boot up :) So when you snapshot an instance, it shuts it down, takes the disk picture, then starts up again. Secondly, it saves that images as an AMI instead of as a regular disk snapshot. Basically it's a bootable snapshot of a volume.
Best Answer
Create one.
And then, try using it. Continue using it over a period of hours and days, and note what you observe.
The first answer to your question is that it actually only takes a few seconds.
The problem with that answer is that it doesn't tell the whole story:
However, you have to understand what the term "immediately" means, here. Immediately does not mean the volume is as fast, initially, as it will eventually be. Remember: the difference between microseconds and milliseconds seems intuitively small but it is still a factor of 1,000.
This is my point, above -- creating the volume only requires a matter of seconds, at which point it is usable, but slow.
EBS volumes are logical entities. When a volume is restored from a snapshot, every block on the volume is logically present and logically available as soon as the new volume becomes available, but not necessarily physically present on the volume the first time you try to read it.
The lag in loading the blocks is, overall, a small price to pay for the immediate availability of any specific block anywhere on the volume, but the impact can be significant, with the significance depending in part on how the volume is used.
The link, above, goes on to explain how you can speed up the warm-up process with
dd
orfio
. What the documentation omits is the fact that you can use either of these in a read-only mode with the volume mounted, and get the benefit of immediate availability while prepping the volume for action. This will have a further negative impact on initial random accesses, but the pain will end sooner than if you do nothing at all, and for that reason it is probably going to be your best choice... but you must put your DR scenario through its paces, observe its operation, and adjust your strategy, accordingly.