Linux – What’s the fastest way to copy millions of files ( hundreds of GBs ) between Amazon EC2 Servers

amazon ec2linux

I am running Linux on Amazon EC2 severs. I need to copy millions of files that total hundreds of gigabytes between two EC2 systems in the same availability zone. I don't need to sync directories, I just need to copy all the files in one directory over to an empty directory on the other machine.

What is the fastest way to do this? Has anyone seen or run performance tests?

rsync? scp? Should I zip them first? Should I detach the drive they are on and re-attach it to the machine I'm copying to, then copy them? Does transferring over the EC2's Private IP speed things up?

Any thoughts would be appreciated.

NOTE: Sorry this was unclear, but I'm copying data between two EC2 systems both in the same AWS availability zone.

Best Answer

If the files are already on an EBS volume (and if you care about them, why aren't they?):

  1. Create a snapshot of the EBS volume containing the files on the first instance.

  2. Create an EBS volume from that snapshot.

  3. Attach the EBS volume to the second instance.

The new EBS volume may be a bit slow for a bit while it's filling in blocks from the snapshot, but it will be usable right away.

ALTERNATIVE (If the files are not already on an EBS volume):

  1. Attach a new EBS volume to the first instance.

  2. Copy the files from other disks to the new EBS volume.

  3. Move the EBS volume to the second instance.