Ssh – Possibility of WAN Optimization for SSH traffic

optimizationrsyncsshwide-area-network

While I do understand that SSH by itself is a very low bandwidth utilizing protocol, it is sometimes choking up bandwidth during peak hours in our office environment. I want to know if at all it is possible to reduce/optimize SSH (rsync) traffic over WAN ?

I realized that riverbed cannot do it. What kind of other designs (proxies) can I possible think of, ignoring security issues like MiTM, DPI, etc. Can TrafficSqueezer, WANProxy or OpenNOP be of any use here ?

Also, please suggest if there are any other ideas to backup data (if there is any) other than rsync. Is it even possible that I think of decrypting SSH with a proxy server before reaching Riverbed and transporting it over WAN to the other end.

Sender (RSYNC) Server --> Proxy (Decrypt SSH) --> Sending Riverbed --> Receiving Riverbed --> Receiver Server

Current Topology:

100's of Users (rsync) --> Source Riverbed --> (passed through traffic/unoptimized)  -->  Destination Riverbed --> Remote Machine

Best Answer

I was initially going to be a naysayer to the idea of trying to "WAN optimize" rsync traffic, but the more I thought about it the more I decided that it was possible.

A TCP-based dictionary compressor (which I believe the Riverbed Steelhead appliances can do) would probably benefit an unencrypted rsync stream. Presumably the Riverbed devices can do encryption on the "optimized" traffic, so running rsync unencrypted shouldn't be compromising the integrity or confidentiality of traffic to the WAN. (Between the source server and the Riverbed device may be a different story.)

You don't have to run rsync over SSH. It will run perfectly fine over TCP or any other reliable stream transport.

It seems like a good WAN acceleration architecture would run somewhat counter to a good security architecture, since encrypted traffic would be high-entropy and low redundancy and not at all conducive to compression. I think these are concerns you'd have to balance. I haven't kept up with Riverbed in a number of years, but this actually seems like a place where man-in-the-middle decryption of encrypted protocols might make good sense (albeit that turns the WAN accelerator into a huge target for attacks).

Edit:

I'm coming back to this answer a few hours later because, frankly, it's keeping me up at night.

I want to clarify some assumptions that I'm making. I am assuming:

  • You're working with WAN links that are significantly slower than a LAN-- 100Mbps or less.

  • You're performing backups across these WAN links that you'd like to speed-up, in terms of wallclock time.

  • The servers hosting the source and destination files have sufficient CPU and network connectivity to completely saturate the WAN link and the WAN really is the bottleneck.

  • You're using operating systems with TCP implementations that can reasonably scale the receive window to accommodate the bandwith delay product of your WAN link.

If the servers can't saturate the network link then your bottleneck is somewhere else. Basically, I'm assuming that you've got a small pipe that your servers can saturate when running backups. If you're bottlenecking on CPU or I/O in the servers then no amount of network-related "magic" is going to help you.

Speaking rather bluntly, I feel a bit silly speaking positively about WAN acceleration appliances. I've been less than impressed with them in the past (mainly from an ROI and cost perspective) and wary of them after hearing numerous horror stories about application and operating system "strangeness" that would disappear when the WAN accelerators were disabled. I've been suspicious of them as a "technology" and generally have felt like they are a symptom of using poorly-engineered protocols, poor server-placement decisions, or poor network architecture.

I've spent the better part of two hours reading-up on dictionary-based protocol compressors and playing around with rsync. I think that, depending on the amount of redundancy in the changes you're synchronizing with rsync, there's actually a potential for seeing some minor performance improvement using a dictionary-based WAN accelerator. It's going to depend a lot on what your data looks like.

I don't have any numbers using an actual WAN accelerator to back that up. Nor do I have any personal experience with WAN accelerators in production use, let alone "accelerating" the rsync protocol. I certainly wouldn't go out and buy something based on what I'm saying here, but if you already have something in place I'd consider running some unencrypted rsync traffic through it to see what it does.