AWS S3 – Why Has S3 Bucket Size Exploded?

amazon s3amazon-cloudwatch

Something happened recently with one of our S3 buckets:

I started looking for where all this extra stuff was coming from, but the metrics I gathered don't seem to match what is going on in CloudWatch (or our bill).

The bucket has a handful of different key prefixes ('folders'), so the first thing I did was to try and work out if any of them was contributing significantly to this number, like so:

aws s3 ls --summarize --human-readable --recursive s3://my-bucket/prefix

However none of the prefixes seemed to contain a huge amount of data, nothing more than a few GB.

I finally tried running

aws s3 ls --summarize --human-readable --recursive s3://my-bucket

…and I got a total size of ~25GB. Am I doing the wrong thing to try and find the 'size of a folder', or misunderstanding something? How can I find where all this extra storage is being used (and find out what process is running amok)?

Best Answer

It was aborted multipart uploads. S3 keeps every uploaded part of every failed multipart upload indefinitely by default! A process had been failing and retrying multipart uploads without explicitly cleaning up the failed transfers.

We remedied this by temporarily enabling versioning, setting a lifecycle rule to remove aborted multipart upload chunks after 1 day, then waited a day, disabling versioning again once the chunklets were cleared.

Related Solutions

Amazon S3 – How to Get the Size of an Amazon S3 Bucket

The AWS CLI now supports the --query parameter which takes a JMESPath expressions.

This means you can sum the size values given by list-objects using sum(Contents[].Size) and count like length(Contents[]).

This can be be run using the official AWS CLI as below and was introduced in Feb 2014

 aws s3api list-objects --bucket BUCKETNAME --output json --query "[sum(Contents[].Size), length(Contents[])]"

How long before an s3 bucket can be created with same name after deletion

The S3 docs used to say:

When you delete a bucket, there may be a delay of up to one hour before the bucket name is available for reuse in a new region or by a new bucket owner. If you re-create the bucket in the same region or with the same bucket owner, there is no delay.

But now they just say:

... it might take some time before the name can be reused ...

Best Answer

Related Solutions

Amazon S3 – How to Get the Size of an Amazon S3 Bucket

How long before an s3 bucket can be created with same name after deletion

Related Topic