R – Ways to synch many (small) files over high-latency network connection

cwrsyncrsyncsvnsynchronization

We typically deploy our software applications to our clients using Subversion (svn update on the clients; unidirectional). We're currently experiencing problems with one of our clients because of the high latency (large file download speeds are good) because they are in China and our server is in Canada. Subversion simply times out with an error after a very long period of time.

Our application has lots of small files (.aspx, .config, etc.) and a few larger files (.dll, .jpg) for a total of about 100mb-200mb.

I am currently considering doing the following:

  1. Do a local svn checkout on the server
  2. Zip the result
  3. FTP or rsync the large zip file to
    the foreign machine
  4. Unzipping the file in a temporary
    folder.
  5. Doing a local rsync from that temp
    folder to our typical installation
    folder.

Are there any better solutions?

  • Setting up a Subversion mirror closer
    to the destination? (I would only need it up a few hours a month but might be hard to find)
  • Using another version control system? (Is git any
    better for high-latency connections)?
  • Are there ways to package subversion
    patches (including binary files) to
    be reapplied at the destination
    instead of sending all of the data?
  • Would using DropBox (which uses Amazon S3) to transfer files into the temporary folder be any better?

Best Answer

Don't knock rsync for the whole tree of small files until you've given it a shot. It doesn't do a round-trip for every single file, it's pipelined, so it should be as fast as anything else on the whole dataset. (As fast as TCP can reassemble the frames into ordered packets on your high-latency link.)

Check out how rsync works for explanation of how it avoids round-trips.

Related Topic