Running multiple scp threads simultaneously:
Background:
I'm often finding myself mirroring a set of server files a lot, and included in these server files are thousands of little 1kb-3kb files. All the servers are connected to 1Gbps ports, generally spread out in a variety of data-centers.
Problem:
SCP transfers these little files, ONE by ONE, and it takes ages, and I feel like I'm wasting the beautiful network resources I have.
Solution?:
I had an idea; Creating a script, which divides the files up into equal amounts, and starts up 5-6 scp threads, which theoretically would then get done 5-6 times faster, no? But I don't have any linux scripting experience!
Question(s):
- Is there a better solution to the mentioned problem?
- Is there something like this that exists already?
- If not, is there someone who would give me a start, or help me out?
- If not to 2, or 3, where would be a good place to start looking to learn linux scripting? Like bash, or other.
Best Answer
I would do it like this:
tar -cf - /manyfiles | ssh dest.server 'tar -xf - -C /manyfiles'
Depending on the files you are transferring it can make sense to enable compression in the
tar
commands:tar -czf - /manyfiles | ssh dest.server 'tar -xzf - -C /manyfiles'
It may also make sense that you choose a CPU friendlier cipher for the
ssh
command (like arcfour):tar -cf - /manyfiles | ssh -c arcfour dest.server 'tar -xf - -C /manyfiles'
Or combine both of them, but it really depends on what your bottleneck is.
Obviously
rsync
will be a lot faster if you are doing incremental syncs.