Rsync daemon: is it really useful

rsync

Are there any practical benefits in using rsyncd compared to rsync over ssh? Does it really increase speed, stability, anything?

Best Answer

I think the big difference is that if you're using rsyncd on the server end, instead of rsync over ssh, the server already knows what it has, so building the file lists to determine what needs to be transferred is much simpler. It won't make a difference if you're just pushing around a few files, but if you're making, for example, CPAN available over rsync, you don't want to have to build the file list on the source side every time.

Related Solutions

Ssh – Is it possible to use rsync over sftp (without an ssh shell)

Unfortunately not directly. rsync requires a clean link with a shell that will allow it to start the remote copy of rsync, when run this way.

If you have some way of running long-lived listening processes on the host you could try starting rsync manually listening for connections on a non-privileged port, but most techniques for doing that would require proper shell access via SSH too, and it relies on the hosts firewall arrangements letting connections in on the port you chose (and the host having rsync installed in the first place). Running rsync as a publicly addressable service (rather than indirectly via SSH or similar) is not generally recommended for non-public data though.

If you host allows scripting in PHP or similar and does not have it locked down so extra processes can not be execed by user scripts, then you could try starting rsync in listening mode that way. If your end is connectible (you are running SSH accessible to the outside world) you could try this in reverse - have a script run rsync on the server but instead of listening for incoming connections have it contact your local service and sync that way. This still relies on rsync actually being installed on the host which is not a given, or that you can upload a working copy, but does not have the security implications of running an rsync daemon in a publicly addressable fashion and talking to it over an unencrypted channel.

Messing around as described above may be against the hosts policies though, even if it works at all, and could get you kicked off. You are better off asking if a full shell can be enabled for that account and either abandoning rsync for that host or abandoning that host and moving elsewhere if they will not do that.

Rsync difference between –checksum and –ignore-times options

Normally, rsync skips files when the files have identical sizes and times on the source and destination sides. This is a heuristic which is usually a good idea, as it prevents rsync from having to examine the contents of files that are very likely identical on the source and destination sides.

--ignore-times tells rsync to turn off the file-times-and-sizes heuristic, and thus unconditionally transfer ALL files from source to destination. rsync will then proceed to read every file on the source side, since it will need to either use its delta-transfer algorithm, or simply send every file in its entirety, depending on whether the --whole-file option was specified.

--checksum also modifies the file-times-and-sizes heuristic, but here it ignores times and examines only sizes. Files on the source and destination sides that differ in size are transferred, since they are obviously different. Files with the same size are checksummed (with MD5 in rsync version 3.0.0+, or with MD4 in earlier versions), and those found to have differing sums are also transferred.

In cases where the source and destination sides are mostly the same, --checksum will result in most files being checksummed on both sides. This could take long time, but the upshot is that the barest minimum of data will actually be transferred over the wire, especially if the delta-transfer algorithm is used. Of course, this is only a win if you have very slow networks, and/or very fast CPU.

--ignore-times, on the other hand, will send more data over the network, and it will cause all source files to be read, but at least it will not impose the additional burden of computing many cryptographically-strong hashsums on the source and destination CPUs. I would expect this option to perform better than --checksum when your networks are fast, and/or your CPU relatively slow.

I think I would only ever use --checksum or --ignore-times if I were transferring files to a destination where it was suspected that the contents of some files were corrupted, but whose modification times were not changed. I can't really think of any other good reason to use either option, although there are probably other use-cases.

Best Answer

Related Solutions

Ssh – Is it possible to use rsync over sftp (without an ssh shell)

Rsync difference between –checksum and –ignore-times options

Related Topic