Ubuntu – Intermittent rsync over ssh errors via cron w/ Ubuntu 10.04: unexplained & protocol data stream

I have a number of rsync clients trying to connect to an rsync server routinely, and they're intermittently failing with one of a couple error messages.

Either:

rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(601) [Receiver=3.0.7]

Or:

rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(601) [Receiver=3.0.7]

The current version of the rsync command I'm using is:

rsync --rsync-path="/usr/bin/rsync" --stats --compress --times --links \
    --log-file=/home/ubuntu/rsynclog.txt --exclude thatfile --recursive \
    xxx.xx.xxx.xx:/home/ubuntu/utility_scripts/ /home/ubuntu/utility_scripts &

I previously had --verbose and --progress but removed them after reading on another forum that someone had resolved some latency issues by removing those options. I've also tried this command in the form of a shell script, thinking perhaps the issue was that my rsync client was attempting to reuse an expired ssh connection. To that end, it fails seemingly at random whether using rsh or ssh. It periodically fails whether or not I do --del or --delete, --compress or not, --rsync-path or not.

I cannot get the command to fail from the command line, but when it runs every minute, it fails 5–15 times an hour, depending on the directory being rsync'ed. The permissions and ownership appear to all be correct, and I'm not relying on any sort of environmental variables that would be causing the cron to fail. All of the relevant software packages (bash, rsync, ssh, Linux) are up to date, all key ports are open, and all clients do not fail simultaneously, suggesting this is not a server-side problem.

tldr: rsync sometimes fails when running as a cron task, have ruled out most RTFM issues, yet the problem persists.

Update: 9/20/10: Updated the EC2 AMI on both the client and the server and ran a 3-box test with 2 clients downloading from 1 server over 24 hours. Upon test completion, the logs had zero errors so I began replacing other instances with the updated AMI instances. After a weekend of running the 35-40ish clients, I have logs once again filled with:

2010/09/20 16:27:01 [18581] rsync error: error in rsync protocol data stream (code 12) at io.c(601) [Receiver=3.0.7]
2010/09/20 16:30:01 [18627] rsync error: unexplained error (code 255) at io.c(601) [Receiver=3.0.7]
2010/09/20 16:36:01 [18739] rsync error: unexplained error (code 255) at io.c(601) [Receiver=3.0.7]
2010/09/20 16:40:02 [18810] rsync error: unexplained error (code 255) at io.c(601) [Receiver=3.0.7]
2010/09/20 16:50:01 [18972] rsync error: unexplained error (code 255) at io.c(601) [Receiver=3.0.7]
2010/09/20 17:00:01 [19139] rsync error: unexplained error (code 255) at io.c(601) [Receiver=3.0.7]
2010/09/20 17:10:01 [19328] rsync error: unexplained error (code 255) at io.c(601) [Receiver=3.0.7]

Is it unreasonable to have 35 clients connect to an rsync server simultaneously? Is this possibly a load issue?

Best Answer

Thanks to Izidor Jerebic who posted the solution to the rsync mailing list:

this might be a problem with maximal number of concurrent ssh connections or connection requests. Ssh daemon has two configuration settings where you can define what is the maximal number of clients which can connect concurrently. This number is by default not very high, so you are probably bumping against that limit.

MaxSessions

Specifies the maximum number of open sessions permitted per network connection. The default is 10.

MaxStartups

Specifies the maximum number of concurrent unauthenticated connections to the SSH daemon. Additional connections will be dropped until authentication succeeds or the LoginGraceTime expires for a connection. The default is 10.

After upping these values, everything has been running just fine. Not to hide from the fact that this was an RTFM issue, but neither MaxStartups or MaxSessions is defined in the ssh man page. And while MaxStartups at least appears in the sshd_config file, MaxSessions seems to only show up in an OpenSSH changelog (http://www.openssh.org/txt/release-5.1).

Best Answer

Related Solutions

SSH SFTP – Is It Possible to Use rsync Over SFTP Without an SSH Shell?

Ubuntu – s3fs: how to force remount on errors

Related Topic