I am in the process of moving my web apps project storage to S3 and I am wondering if S3 versioning is really a good way to handle backups of the data or is there some better way? If the files were all to be deleted or corrupted in some way is it possible and easy to restore an entire bucket using the versioning? If so how? If not, what would be a better backup option for S3?
Is Amazon S3 versioning really a reasonable backup options
amazon s3backup
Related Solutions
What is your general strategy to backup S3 buckets?
Depending on what data you are storing you may not be interested in backing up data from S3. For instance if you have general website assets that you already have a copy of in a repository elsewhere you probably don't need to backup the assets that live in S3.
Sometimes you may use S3 to store user uploads. These might have originated from an EC2 or they may have gone straight to S3. It makes sense to use Object Versioning to be able to recover from script errors or users deleting files but changing their mind. http://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectVersioning.html
As far as I understand versioning is done on the object level, so if you wanted to "revert to how your bucket looked 3 days ago" you would need to build a script that could check all the versions and dates, and request the right version for each object. This would be possible to do, it just requires a little bit of effort at the application level first.
You could look at other methods, such as syncing all the S3 bucket objects to another service (a third party server, or an EBS backed EC2). This could be your daily or weekly snapshot. This method adds extra costs, maintenance and effort so might not be the best solution, particularly for 5TB of data.
"How do you backup your entire cloud infrastructure? What is your disaster recovery plan?" How to backup Route53? CloudFront settings?
Depending on how far you want to go, all this sort of information should be scripted and in configuration files. Those configuration files should be backed up. This touches on DEVOPS and the concept of infrastructure as code.
How much time will it take to recover from script error or losing access to root console?
This is question sounds difficult to answer. What sort of script error? The first question touches on one example (a script deleting a file that lives on S3) however there are plenty more.
You can look into SimianArmy https://github.com/Netflix/SimianArmy
The Simian Army is a suite of tools for keeping your cloud operating in top form. Chaos Monkey, the first member, is a resiliency tool that helps ensure that your applications can tolerate random instance failures
As for access to "root console" if you're talking about access to your OS, or your EC2s...all that should be scripted via Puppet/Chef or similar and therefore your machines are "throwaway". There is nothing special about them, they contain no individual user data and you can bring one up or down without affecting your system.
If your talking about access to the AWS console, you would need to do things like email or call to gain access, or there may be outages that you need to account for.
I have read about the versioning feature for S3 buckets, but I cannot seem to find if >recovery is possible for files with no modification history. See the AWS docs here on >versioning:
I've just tried this. Yes, you can restore from the original version. When you delete the file it makes a delete marker and you can restore the version before that, i.e: the single, only, revision.
Then, we thought we may just backup the S3 files to Glacier using object lifecycle >management:
But, it seems this will not work for us, as the file object is not copied to Glacier but >moved to Glacier (more accurately it seems it is an object attribute that is changed, but >anyway...).
Glacier is really meant for long term storage, which is very infrequently accessed. It can also get very expensive to retrieve a large portion of your data in one go, as it's not meant for point-in-time restoration of lots of data (percentage wise).
Finally, we thought we would create a new bucket every month to serve as a monthly full >backup, and copy the original bucket's data to the new one on Day 1. Then using something >like duplicity (http://duplicity.nongnu.org/) we would synchronize the backup bucket every >night.
Don't do this, you can only have 100 buckets per account, so in 3 years you'll have taken up a third of your bucket allowance with just backups.
So, I guess there are a couple questions here. First, does S3 versioning allow recovery of >files that were never modified?
Yes
Is there some way to "copy" files from S3 to Glacier that I have missed?
Not that i know of
Best Answer
Versioning is a great feature, and it should absolutely be used if possible. Having versioning enabled (and using appropriately-provisioned access keys) can save you from all manner of issues.
But.
Versioning won't protect you from;
You need to have backups of your data outside of S3, even if that's an external hard drive that you run
$ aws s3 sync
against a couple of times a day. Having a backup-of-last-resort is very simple to do, and is very inexpensive.