Amazon AWS – Export EBS Snapshot to External Storage

amazon ec2amazon-ebsamazon-web-servicesexport

On the Amazon AWS platform, is there a way of exporting an EBS volume to an external disk?

I.e. a backup outside of Amazon's infrastructure?

From what I have read so far:

  1. Amazon's "Import/Export (Disk)" service supports exporting data from S3 buckets

  2. EBS snapshots are implicitly stored in some form of opaque S3 bucket, but this bucket is not visible to AWS admins

So it appears there is no way to export EBS snapshots.
Has anyone had luck with this?

Thanks,

EDIT: I have ~2.5TB of mongodb data, of which I need to make a local copy (i.e. a 2.5" external). Downloading that data will cost ~ $220 USD ($0.09/GB), and take ~ 10 days @ 3MB/s (not to mention if there are network issues). That is why I'm trying to go down the Amazon Import/Export process. My mongo instances use LVM/XFS, so I have the ability to generate snapshots.

Best Answer

EBS snapshots are stored in S3, but they're managed by EBS and in buckets that you aren't able to access.

While this sounds confusing, there is a good explanation.

EBS snapshots are not stored individually. They rely on information provided by the EBS infrastructure so that they only capture blocks that have been changed since the previous snapshot. (Take two consecutive snapshots of the same volume, and almost inevitably the second will complete faster than the first, for this reason.) The snapshot subsystem then backs up only those changed blocks, and creates logical links to blocks in the previous snapshots that are needed to restore the entire volume. Later, if those previous snapshots are deleted, only the blocks that are not linked to any later snapshots are purged. This provides advantages faster snapshots, and the ability to purge old snapshots without needing to worry about later "incremental" backups that depend on previous backups. EBS manages that aspect, keeping what is needed and purging what is not (and not billing you when unneeded data is purged).

This setup leads to a dramatic storage efficiency and cost savings, because you're only paying to store the differences. Comparing the total size of your snapshots and the number of GB of snapshot storage you are paying for, the total should be less, and the more snapshots you have of the same volumes, the total can be substantially less.

If the snapshots were stored individually in S3, the cost would be much higher.

However... there is a way to export an EBS snapshot offsite, but it's a manual process.

To do this, you need a spare linux EC2 instance. The simplified version of the process:

  • Boot the instance
  • Create an EBS volume from the snapshot
  • Attach the new volume to the instance, but don't mount it
  • Access the raw data on the volume using the assigned block device file, e.g. /dev/xvdf.

From here, you can use standard tools like dd or pv to read the raw data stream from the device, and send it where you want it. For example, let's assume you have an off-site SSH server that is accessible from the instance.

$ sudo pv -pterab /dev/xvzf | \
  pbzip2 -9 | \
  ssh user@offsite.example.com \
  'cat > /some/large/disk/my-snapshot.bz2'

Line 1 reads from the block device and shows a progress indicator.

Line 2 compresses the raw data using multicore bzip2 at maximim compression

Line 3 establishes an SSH connection to the offsite server, piping the compressed output

Line 4 writes the compressed disk image file to a file on the remote machine.

Bringing the volume back into AWS would involve creating an empty volume and reversing the process, piping the file back in, decompressing it, and writing it to a block device.

Note, however, that disk snapshots are not usually the best approach for backups. They are fast and easy, but relying on snapshots is a sign that your recovery strategy should be reconsidered.

If the volume in question contains a database, using logical backup tools for offsite backup is probably a tidier solution. If the volume contains assets, you can use tarballs or rsync. If the volume contains your application code, you really need an infrastructure that allows you to repeatably build working servers from scratch from version-controlled source, through automation. This requires a change of mindset and has a significant up-front investment in time, but will serve you much better over the long haul.

Related Topic