Method for offsite backup of EC2 servers

amazon-web-servicesbackupdisaster-recovery

We run a dozen or so Ubuntu Linux webserver production instances on Amazon VPC. The instances are bootstrapped and managed via Puppet. Most management is done via the AWS Console.

Our AWS credentials are pretty secure. The master-account is hardly ever needed, has a strong password and 2-factor auth. A few trusted admins have access to most services via their own IAM accounts, also with strong passwords and 2-factor auth. A few IAM accounts have very limited access for specific purposes, such as writing files to S3. Access by other employees to any high-level credentials is very limited. Overall, the chance of someone gaining access to the Console or API's seems low.

The recent Code Spaces debacle, where someone gained high-level access to their AWS Console and deleted instances, volumes and EBS Snapshots, effectively making it impossible for Code Spaces to recover their business, got me to investigate methods for backing up our data off-line/offsite (i.e. out of reach of our main AWS account).

How can I ensure our customer data is safe from being wiped out by someone who gains access to our AWS credentials, or by some disaster at AWS? Should be automatic, stable and reasonably priced.

I can't seem to be able to find an 'easy' way, after searching for a few hours. Copying EBS snapshots to another AWS account doesn't seem possible. I can't export EBS snapshots to S3 objects. I could rsync all important data by pulling from a third-party server but I'd need to script it to handle things like varying numbers of servers, retention, error-handling, etc. Seems like a lot of work. I found no ready-to-go software for this.

Our current backup strategy consists of nightly automated EBS Snapshots of all volumes, as well as uploading compressed MySQLdumps to S3. All source code and Puppet code is deployed from external version control, but our customers' files and MySQL databases are only stored on the EBS volumes and their snapshots, i.e. insider the AWS ecosystem.

Best Answer

A lot of people tend to over-think this. Just think of these servers as if they were deployed in a colo or in a corporate datacenter. In that case, how would you back them up?

Likely it would be via a "legacy" backup product (Netbackup, Amanda, BareOS, etc.) that is connected to a tape library or VTL.

This is something you should consider doing for your AWS infrastructure. Build up a backup server and tape library outside of amazon somewhere and use that as your "doomsday" restoration method.

Tape is one of the most reliable data storage mechanisms and unlike all other cloud backup systems, is not vulnerable to the type of thing that happened to CodeSpaces. Your backup data is truly offline, and you can keep the tapes in as secure a location as you choose - anywhere from a fire safe in the office to renting a safe deposit box at your local bank. Getting that kind of protection from a cloud storage provider is impossible.

You already have configuration management in place. (Yay!) So in the event of a disaster, you'll be able to re-build your servers in a reasonably fast manner, so the tape backup (or VTL) will be mostly for your data. Databases, uploaded files, etc. Things that aren't covered by your puppet manifests.

If this isn't an option, the next best thing would be to create a completely separate AWS account for backup purpose. Within that account, create IAM credentials for S3 that have upload-only permissions and then use that from your production environment to push backups. Ensure these credentials are kept in a completely separate location from your production credentials to limit the possibility that they both get compromised at the same time.

Related Solutions

Cron – Amazon EC2 EBS volume scheduled backup/snapshots using puppet / similar tools

Take a look at Skeddly for automated, rolling EBS snapshots.

Disclaimer, I am associated with this product

Windows – How to schedule automatic (daily) snapshots of AWS EC2 Windows Instance

Amazon Web Services recently announced PowerShell command line tools for Windows and it's packaged along with their AWS Tools for .NET SDK.

The AWS Powershell tools make it quite easy to create a snapshot:

New-EC2Snapshot "vol-371acd04" -Description "My Snapshot"

And you can query your snapshots like this:

PS C:\Program Files (x86)\AWS Tools\PowerShell> Get-EC2Snapshot | more


SnapshotId  : snap-18be2b28
VolumeId    : vol-371acd04
Status      : completed
StartTime   : 2012-12-28T08:17:00.000Z
Progress    : 100%
OwnerId     : 383816850479
VolumeSize  : 30
Description : My Snapshot
OwnerAlias  :
Tag         : {}

Make sure you have the AWS Powershell tools installed and just create a scheduled task that uses a powershell script similar to the snippet above to schedule your snapshots and you should be good.

Updated to query for attached EBS volumes:

To query for EBS volumes attached to your instance and then snapshot each of them you could do something like this:

# Find my instance ID from the EC2 metadata
$myInstanceID = (New-Object System.Net.WebClient).DownloadString("http://169.254.169.254/latest/meta-data/instance-id")

# Query for volumes that are attached to my Instance Id
$volumes = (Get-EC2Volume).Attachment | where {$_.InstanceId -eq $myInstanceID } | Select VolumeId

# Iterate through these volumes and snapshot each of them
foreach ($volume in $volumes)
{
    New-EC2Snapshot $volume.VolumeId -Description "My Snapshot"
}

Best Answer

Related Solutions

Cron – Amazon EC2 EBS volume scheduled backup/snapshots using puppet / similar tools

Windows – How to schedule automatic (daily) snapshots of AWS EC2 Windows Instance

Related Topic