Why would a Image Hosting website, such as Imgur, use AWS EC2 instances over S3 buckets for uploads

amazon ec2amazon s3amazon-web-servicesimages

I was reading a Q&A with the creator of Imgur, and he went into detail explaining the server infrastructure that Imgur runs on. Here's a small quote from what he had to say:

Most of the clusters use c1.xlarge instances. The upload cluster handles all uploads and image processing requests, like thumbnails and resizing, and each instance is a huge cluster instance, cc1.4xlarge.

I understand images don't take up much space, but why even go this route? Especially, considering the significant cost different.

If you want to read the whole Q&A, you can check it our here. I found it quite interesting.

Best Answer

S3 mainly offers exceptionally high durability and very low administration overhead. The service itself is not really that cheap (especially when it comes to serving requests), but at most scales the labour cost of managing alternatives blow any savings out of the water. However, at very large scales the savings start to outweigh the management overhead.

For example:

GET requests on S3 cost $0.004 per 10,000 requests.

A T2.micro can do about 180 Mbits/s and costs $0.013/h. Assuming a 500kB image size (4000 kbits), that's about 46 images/s. Assuming you can saturate that instance (which a large scale image sharing service presumably could), that's about 165k requests/hour.

So for a T2.micro it would cost you $0.013/h vs. $0.066 on S3. In practice you might hit other bottlenecks on a T2.micro so S3 would probably end up slightly ahead at this scale.

However, if you take a c4.8xlarge (with 10Gbit networking) it would cost $1.763/h. With that you could serve about 2620 images/s, or around 9.4m/hour. That would cost you $3.76/hour on S3. Add reserved instance discounts, etc. and the difference would get even bigger.

On top of that, you can't offload processes such as resizing images to S3 and you may also want to run a WAF or DDoS protection layer to reduce bandwidth costs due to attacks.

Having said that, a common architecture is to store the originals in S3 (where they will rarely be accessed, but where durability is important) and to cache resized versions on the front end servers. I believe Netflix did or does use this technique (except they stored the cached files on their own colo hardware). It wouldn't surprise me if Imgur did that too.

Related Topic