Rsync over SSHFS hangs

rackspace-cloud-sitesrsyncsshfs

I have sshfs setup to connect to another system (Rackspace Cloud Sites) that I don't have ssh access to (but for some reason I can use sshfs? Go figure). I'm attempting to rsync files from the sshfs mount to my local disk. It's several thousand small files (1k-200k). Sometimes, the rsync will just pause and hang for a while on files that are very small. It will pause on, say a 10k text file for like 5 minutes, then it will continue.

Is there anyplace I can look on my machine to determine why rsync would be hanging like this? Or is there a good chance it's simply a problem on the other end that I can't do anything about?

My rsync options are simply -avrP.

Best Answer

Alright, I'm going to take a stab at this, because I think my idea makes sense.

You are dealing with multiple caches in this case, and that's what is tripping you up, I think.

The first thing that rsync does is to determine which files it needs to transfer. It usually does this by spawning an instance of rsync at the remote side, reading the metadata for each of the files in the directory on the source, while at the same time reading metadata for the local files, and then the two metadata sets are compared. Anything newer (or different, depending on the rsync options) gets transferred.

You don't have a "remote side", according to rsync. You're working "locally", so it will iterate over both directories, the source and the destination.

This is very disk intensive, particularly with a ton of small files - the more files, the more discrete disk operations. This causes a lot of disk thrashing, plus it fills the cache with the metadata from those files.

Notice that this happens all the way down the stack. Your local machine caches metadata from the FUSE filesystem you've got mounted over ssh AND the local directory. The remote machine caches metadata from the local disk mount. And the VM host that your remote machine is running on is almost certainly overcommitted and giving you ballooned memory.

I suspect that it's very likely that you're crossing thresholds when it freezes, and everything has to catch up and either decache or swap.

I would be very interested to see if this happens when you do rsync over ssh without the disk mount.