I mount a remote server over ssh (using sshfs). I want to copy a large number of files from the remote server to local:
cp -rnv /mounted_path/source/* /local_path/destination
The command runs recursive copying that doesn't overwrite existing files. But the copying process is rather slow. I notice that it does not copy files in order. So my question is: can I speed the copying process by opening multiple terminals and run the same command above? Is a copying process smart enough to not overwriting the files copied by other processes?
Best Answer
…to answer the original question as stated…
There are two things to discuss here.
Using SSHFS
SSHFS uses the SFTP "subsystem" of the SSH protocol to make a remote filesystem appear as if it were mounted locally.
A crucial thing here is to note that SSHFS translates low-level syscalls into relatively high-level SFTP commands which are then translated into the syscalls executed on the server by the SFTP server, and then their results are sent back to the client and translated backwards.
There are several sources of slowness with this process:
stat(2)
-s the information on a file thenopen(2)
-s that file then reads its data — by executing severalread(2)
calls in a row and then finallyclose(2)
-s the file, all those syscalls have to be translated to SFTP commands, sent to the server and processed there with their results sent back to the client, translated back.Even while SSHFS appears to implement certain clever hacks such as "read ahead" (speculatively reads more data than requested by the client), still, each syscall results in a round-trip to the server and back. That is, we send the data to the server then wait for it to respond then process its response. IIUC, SFTP does not implement "pipelining" — a mode of operation where we send commands before they are completed, so basically each syscall. While it's technically possible to have such processing to a certain degree,
sshfs
does not appear to implement it.IOW, each syscall
cp
on your client machine makes, is translated to a request to the server followed by waiting for it to respond and then receiving its response.Multiple
cp -n
processes run in parallelThe answer to the question of whether it's OK to employ multiple
cp -n
processes copying files in parallel depends on several considerations.First, if they all will run over the same SSHFS mount, there will obviosly no speedup as all the syscalls issued by multiple
cp
will eventually hit the same SFTP client connection and will be serialized by it due to the reasons explained above.Second, running several instances of
cp -n
running over distinct SSHFS mount points may be worthwhile — up to the limits provided by the network throughput and the I/O throughput by the medium/media under the target filesystem. In this case, it's crucial to understand that since SSHFS won't use any locking on the server, the different instances ofcp -n
must operate on distinct directory hierarchies — simply to not step on each others' toes.Different / more sensible approaches
First, piping data stream created by
tar
,cpio
or another streaming archiver and processing it remotely has the advantage that all round-trips for the file system operations are avoided: the local archiver creates the stream as fast as the I/O throughput on the source filesystem allows and sends it as fast as the network allows; the remove archiver extracts data from the stream and updates its local filesystem as fast as it allows. No round trips to execute elementary "commands" are involved: you just go as fast as the slowest I/O point in this pipeline allows you to; it's simply impossible to go faster.Second, another answer suggested using
rsync
and you rejected that suggestion on the grounds ofThis is simply wrong. To cite the
rsync
manual page:and
and finally
That is,
rsync
does not hash the file's contents to see whether a file has changed.cp -n
, that is, skip updating a file if it merely exists on the remote.