How to get the size of an Amazon S3 bucket

amazon s3amazon-web-servicesaws-clidisk-space-utilization

I'd like to graph the size (in bytes, and # of items) of an Amazon S3 bucket and am looking for an efficient way to get the data.

The s3cmd tools provide a way to get the total file size using s3cmd du s3://bucket_name, but I'm worried about its ability to scale since it looks like it fetches data about every file and calculates its own sum. Since Amazon charges users in GB-Months it seems odd that they don't expose this value directly.

Although Amazon's REST API returns the number of items in a bucket, s3cmd doesn't seem to expose it. I could do s3cmd ls -r s3://bucket_name | wc -l but that seems like a hack.

The Ruby AWS::S3 library looked promising, but only provides the # of bucket items, not the total bucket size.

Is anyone aware of any other command line tools or libraries (prefer Perl, PHP, Python, or Ruby) which provide ways of getting this data?

Best Answer

The AWS CLI now supports the --query parameter which takes a JMESPath expressions.

This means you can sum the size values given by list-objects using sum(Contents[].Size) and count like length(Contents[]).

This can be be run using the official AWS CLI as below and was introduced in Feb 2014

 aws s3api list-objects --bucket BUCKETNAME --output json --query "[sum(Contents[].Size), length(Contents[])]"