Linux – Running multiple scp threads simultaneously

centoslinuxmulti-threadingscp

Running multiple scp threads simultaneously:

Background:

I'm often finding myself mirroring a set of server files a lot, and included in these server files are thousands of little 1kb-3kb files. All the servers are connected to 1Gbps ports, generally spread out in a variety of data-centers.

Problem:

SCP transfers these little files, ONE by ONE, and it takes ages, and I feel like I'm wasting the beautiful network resources I have.

Solution?:

I had an idea; Creating a script, which divides the files up into equal amounts, and starts up 5-6 scp threads, which theoretically would then get done 5-6 times faster, no? But I don't have any linux scripting experience!

Question(s):

  • Is there a better solution to the mentioned problem?
  • Is there something like this that exists already?
  • If not, is there someone who would give me a start, or help me out?
  • If not to 2, or 3, where would be a good place to start looking to learn linux scripting? Like bash, or other.

Best Answer

I would do it like this:
tar -cf - /manyfiles | ssh dest.server 'tar -xf - -C /manyfiles'

Depending on the files you are transferring it can make sense to enable compression in the tar commands:
tar -czf - /manyfiles | ssh dest.server 'tar -xzf - -C /manyfiles'

It may also make sense that you choose a CPU friendlier cipher for the ssh command (like arcfour): tar -cf - /manyfiles | ssh -c arcfour dest.server 'tar -xf - -C /manyfiles'

Or combine both of them, but it really depends on what your bottleneck is.
Obviously rsync will be a lot faster if you are doing incremental syncs.