I have to copy 400G of files from an elastic block store volume to an s3 bucket… Those are about 300k files of ~1Mb
I've tried s3cmd and s3fuse, both of them are really, really slow.. s3cmd ran for a complete day, said it finished copying, and when I checked the bucket, nothing had happened (I suppose something went wrong, but at least s3cmd never complained of anything)
S3Fuse is working for an other complete day, and copied less than 10% of files…
Is there a better solution for this?
I'm running Linux (ubuntu 12.04) of course
Best Answer
There are several key factors that determine throughput from EC2 to S3:
In cases of transferring large amounts of data, it may be economically practical to use a cluster compute instance, as the effective gain in throughput (>10x) is more than the difference in cost (2-3x).
While the above ideas are fairly logical (although, the per-thread cap may not be), it is quite easy to find benchmarks backing them up. One particularly detailed one can be found here.
Using between 64 and 128 parallel (simultaneous) uploads of 1MB objects should saturate the 1Gbps uplink that an m1.xlarge has and should even saturate the 10Gbps uplink of a cluster compute (cc1.4xlarge) instance.
While it is fairly easy to change instance size, the other two factors may be harder to manage.