AWS S3 – Understanding ‘Increased Request Rate Performance’ Announcement

amazon s3amazon-web-servicesperformance

On 17 July 2018 there was an official AWS announcement explaining that there is no longer any need to randomize the first characters of every S3 object key to achieve maximum performance: https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3-announces-increased-request-rate-performance/

Amazon S3 Announces Increased Request Rate Performance

Posted On: Jul 17, 2018

Amazon S3 now provides increased performance to support at least 3,500
requests per second to add data and 5,500 requests per second to
retrieve data, which can save significant processing time for no
additional charge. Each S3 prefix can support these request rates,
making it simple to increase performance significantly.

Applications running on Amazon S3 today will enjoy this performance
improvement with no changes, and customers building new applications
on S3 do not have to make any application customizations to achieve
this performance. Amazon S3’s support for parallel requests means you
can scale your S3 performance by the factor of your compute cluster,
without making any customizations to your application. Performance
scales per prefix, so you can use as many prefixes as you need in
parallel to achieve the required throughput. There are no limits to
the number of prefixes.

This S3 request rate performance increase removes any previous
guidance to randomize object prefixes to achieve faster performance.
That means you can now use logical or sequential naming patterns in S3
object naming without any performance implications. This improvement
is now available in all AWS Regions. For more information, visit the
Amazon S3 Developer Guide.

That's great, but it's also confusing. It says Each S3 prefix can support these request rates, making it simple to increase performance significantly

But since prefixes and delimiters are just arguments to the GET Bucket (List Objects) API when listing the content of buckets, how can it make sense to talk about object retrieval performance "per prefix". Every call to GET Bucket (List Objects) can choose whatever prefix and delimiter it wants, so prefixes are not a pre defined entity.

For example, if my bucket has these objects:

a1/b-2
a1/c-3

Then I may choose to use "/" or "-" as my delimiter whenever I list the bucket contents, so I might consider my prefixes to be either

a1/ 

or

a1/b-
a1/c-

But since the GET Object API uses the whole key, the concept of a particular prefix or delimiter does not exist for object retrieval. So can I expect 5,500 req/sec on a1/ or alternatively 5,500 req/sec on a1/b- and 5,500 on a1/c-?

So can someone explain what is meant by the announcement when it suggests a particular level of performance (e.g. +5,500 requests per second to retrieve data) for "each s3 prefix"?

Best Answer

What's actually being referred to here as a prefix appears to be an oversimplification that really refers to each partition of the bucket index. The index is lexical, so splits occur based on leading characters in the object key. Hence, it's referred to as the prefix.

S3 manages the index partitions automatically and transparently, so the precise definition of a "prefix" here is actually somewhat imprecise: it's "whatever S3 decides is needed to support your bucket's workload." S3 splits the index partitions in response to workload, so two objects that might have the same "prefix" today could have different prefixes tomorrow, all done in the background.

Right now, a1/a-... and a1/b-... and a1/c-... may be all a single prefix. But throw enough traffic at the bucket, and S3 may decide the partition should be split, so that tomorrow, a1/a- and a1/b- may be in one prefix, while a1/c- may be in its own prefix. (That is, keys < a1/c- are in one partition, while keys >= a1/c- are now in a different partition).

Where and when and specifically what threshold triggers the split behavior isn't documented, but it appears to be related only to the number of requests, and not the number or size of the objects. Previously, these partitions were limited to a few hundred requests per second each, and that's been significantly increased.