The AWS CLI now supports the --query
parameter which takes a JMESPath expressions.
This means you can sum the size values given by list-objects
using sum(Contents[].Size)
and count like length(Contents[])
.
This can be be run using the official AWS CLI as below and was introduced in Feb 2014
aws s3api list-objects --bucket BUCKETNAME --output json --query "[sum(Contents[].Size), length(Contents[])]"
I have read about the versioning feature for S3 buckets, but I cannot seem to find if >recovery is possible for files with no modification history. See the AWS docs here on >versioning:
I've just tried this. Yes, you can restore from the original version. When you delete the file it makes a delete marker and you can restore the version before that, i.e: the single, only, revision.
Then, we thought we may just backup the S3 files to Glacier using object lifecycle >management:
But, it seems this will not work for us, as the file object is not copied to Glacier but >moved to Glacier (more accurately it seems it is an object attribute that is changed, but >anyway...).
Glacier is really meant for long term storage, which is very infrequently accessed. It can also get very expensive to retrieve a large portion of your data in one go, as it's not meant for point-in-time restoration of lots of data (percentage wise).
Finally, we thought we would create a new bucket every month to serve as a monthly full >backup, and copy the original bucket's data to the new one on Day 1. Then using something >like duplicity (http://duplicity.nongnu.org/) we would synchronize the backup bucket every >night.
Don't do this, you can only have 100 buckets per account, so in 3 years you'll have taken up a third of your bucket allowance with just backups.
So, I guess there are a couple questions here. First, does S3 versioning allow recovery of >files that were never modified?
Yes
Is there some way to "copy" files from S3 to Glacier that I have missed?
Not that i know of
Best Answer
In this case it took 1 day (or even less) to transition.
Applied Lifecycle rule to "Transition current versions of objects between storage classes" from S3 Standard to S3 Glacier and set 1 day.
Bucket metrics: size - 73 TB; total number of objects - 6.5 M.
The rule applied on Oct 6 at 9:30 am.
Checked bucket on Oct 7, at 7:00 am - all objects showed "Glacier" storage class.
Before I had another case:
The same Lifecycle rule mentioned above.
Bucket metrics: size - 11 TB; total number of objects - 2.1 M.
Transition took about 3 days (didn't note the exact time).
So based only on bucket size and objects count we can't predict.