You're safe to use the volume once you have triggered the snapshot, even if it's still in a pending state according to AWS - see this post.
If you're taking a snapshot for the first time, it probably will take a while as it has to make a full copy to the region-wide S3 bucket, but remember, it's incremental after the first one has been stored so should be a lot faster.
NOTE: You can't create a volume out of a snapshot which is in a pending state. You'll get the error "Snapshot is in invalid state" if you do this.
So please make sure to wait until the snapshot is in the "available" state.
An AMI, as you note, is a machine image. It's a total snapshot of a system stored as an image that can be launched as an instance. We'll get back to AMIs in a second.
Lets look at EBS. Your other two items are sub-items of this. EBS is a virtual block device. You can think of it as a hard drive, although it's really a bunch of software magic to link into another kind of storage device but make it look like a hard drive to an instance.
EBS is just the name for the whole service. Inside of EBS you have what are called volumes. These are the "unit" amazon is selling you. You create a volume and they allocate you X number of gigabytes and you use it like a hard drive that you can plug into any of your running computers (instances). Volumes can either be created blank or from a snapshot copy of previous volume, which brings us to the next topic.
Snapshots are ... well ... snapshots of volumes: an exact capture of what a volume looked like at a particular moment in time, including all its data. You could have a volume, attach it to your instance, fill it up with stuff, then snapshot it, but keep using it. The volume contents would keep changing as you used it as a file system but the snapshot would be frozen in time. You could create a new volume using this snapshot as a base. The new volume would look exactly like your first disk did when you took the snapshot. You could start using the new volume in place of the old one to roll-back your data, or maybe attach the same data set to a second machine. You can keep taking snapshots of volumes at any point in time. It's like a freeze-frame instance backup that can then easy be made into a new live disk (volume) whenever you need it.
So volumes can be based on new blank space or on a snapshot. Got that? Volumes can be attached and detached from any instances, but only connected to one instance at a time, just like the physical disk that they are a virtual abstraction of.
Now back to AMIs. These are tricky because there are two types. One creates an ephemeral instances where the root files system looks like a drive to the computer but actually sits in memory somewhere and vaporizes the minute it stops being used. The other kind is called an EBS backed instance. This means that when your instances loads up, it loads its root file system onto a new EBS volume, basically layering the EC2 virtual machine technology on top of their EBS technology. A regular EBS volume is something that sits next to EC2 and can be attached, but an EBS backed instance also IS a volume itself.
A regular AMI is just a big chunk of data that gets loaded up as a machine. An EBS backed AMI will get loaded up onto an EBS volume, so you can shut it down and it will start back up from where you left off just like a real disk would.
Now put it all together. If an instance is EBS backed, you can also snapshot it. Basically this does exactly what a regular snapshot would ... a freeze frame of the root disk of your computer at a moment in time. In practice, it does two things different. One is it shuts down your instance so that you get a copy of the disk as it would look to an OFF computer, not an ON one. This makes it easier to boot up :) So when you snapshot an instance, it shuts it down, takes the disk picture, then starts up again. Secondly, it saves that images as an AMI instead of as a regular disk snapshot. Basically it's a bootable snapshot of a volume.
Best Answer
EBS snapshots are stored in S3, but they're managed by EBS and in buckets that you aren't able to access.
While this sounds confusing, there is a good explanation.
EBS snapshots are not stored individually. They rely on information provided by the EBS infrastructure so that they only capture blocks that have been changed since the previous snapshot. (Take two consecutive snapshots of the same volume, and almost inevitably the second will complete faster than the first, for this reason.) The snapshot subsystem then backs up only those changed blocks, and creates logical links to blocks in the previous snapshots that are needed to restore the entire volume. Later, if those previous snapshots are deleted, only the blocks that are not linked to any later snapshots are purged. This provides advantages faster snapshots, and the ability to purge old snapshots without needing to worry about later "incremental" backups that depend on previous backups. EBS manages that aspect, keeping what is needed and purging what is not (and not billing you when unneeded data is purged).
This setup leads to a dramatic storage efficiency and cost savings, because you're only paying to store the differences. Comparing the total size of your snapshots and the number of GB of snapshot storage you are paying for, the total should be less, and the more snapshots you have of the same volumes, the total can be substantially less.
If the snapshots were stored individually in S3, the cost would be much higher.
However... there is a way to export an EBS snapshot offsite, but it's a manual process.
To do this, you need a spare linux EC2 instance. The simplified version of the process:
/dev/xvdf
.From here, you can use standard tools like
dd
orpv
to read the raw data stream from the device, and send it where you want it. For example, let's assume you have an off-site SSH server that is accessible from the instance.Line 1 reads from the block device and shows a progress indicator.
Line 2 compresses the raw data using multicore bzip2 at maximim compression
Line 3 establishes an SSH connection to the offsite server, piping the compressed output
Line 4 writes the compressed disk image file to a file on the remote machine.
Bringing the volume back into AWS would involve creating an empty volume and reversing the process, piping the file back in, decompressing it, and writing it to a block device.
Note, however, that disk snapshots are not usually the best approach for backups. They are fast and easy, but relying on snapshots is a sign that your recovery strategy should be reconsidered.
If the volume in question contains a database, using logical backup tools for offsite backup is probably a tidier solution. If the volume contains assets, you can use tarballs or rsync. If the volume contains your application code, you really need an infrastructure that allows you to repeatably build working servers from scratch from version-controlled source, through automation. This requires a change of mindset and has a significant up-front investment in time, but will serve you much better over the long haul.