The best way to transfer a single large file over a high-speed, high-latency WAN link

bandwidthfile-transfernetworkingtcp

This looks related to this one, but it's somewhat different.

There is this WAN link between two company sites, and we need to transfer a single very large file (Oracle dump, ~160 GB).

We've got full 100 Mbps bandwidth (tested), but looks like a single TCP connection just can't max it out due to how TCP works (ACKs, etc.). We tested the link with iperf, and results change dramatically when increasing the TCP Window Size: with base settings we get ~5 Mbps throughput, with a bigger WS we can get up to ~45 Mbps, but not any more than that. The network latency is around 10 ms.

Out of curiosity, we ran iperf using more than a single connections, and we found that, when running four of them, they would indeed achieve a speed of ~25 Mbps each, filling up all the available bandwidth; so the key looks to be in running multiple simultaneous transfers.

With FTP, things get worse: even with optimized TCP settings (high Window Size, max MTU, etc.) we can't get more than 20 Mbps on a single transfer. We tried FTPing some big files at the same time, and indeed things got a lot better than when transferring a single one; but then the culprit became disk I/O, because reading and writing four big files from the same disk bottlenecks very soon; also, we don't seem to be able to split that single large file into smaller ones and then merge it back, at least not in acceptable times (obviously we can't spend splicing/merging back the file a time comparable to that of transferring it).

The ideal solution here would be a multithreaded tool that could transfer various chunks of the file at the same time; sort of like peer-to-peer programs like eMule or BitTorrent already do, but from a single source to a single destination. Ideally, the tool would allow us to choose how many parallel connections to use, and of course optimize disk I/O to not jump (too) madly between various sections of the file.

Does anyone know of such a tool?

Or, can anyone suggest a better solution and/or something we already didn't try?

P.S. We already thought of backing that up to tape/disk and physically sending it to destination; that would be our extreme measure if WAN just doesn't cut it, but, as A.S. Tanenbaum said, "Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway."

Best Answer

Searching for "high latency file transfer" brings up a lot of interesting hits. Clearly, this is a problem that both the CompSci community and the commercial community has put thougth into.

A few commercial offerings that appear to fit the bill:

  • FileCatalyst has products that can stream data over high-latency networks either using UDP or multiple TCP streams. They've got a lot of other features, too (on-the-fly compression, delta transfers, etc).

  • The fasp file transfer "technology" from Aspera appears to fit the bill for what you're looking for, as well.

In the open-source world, the uftp project looks promising. You don't particularly need its multicast capabilities, but the basic idea of blasting out a file to receivers, receiving NAKs for missed blocks at the end of the transfer, and then blasting out the NAK'd blocks (lather, rinse, repeat) sounds like it would do what you need, since there's no ACK'ing (or NAK'ing) from the receiver until after the file transfer has completed once. Assuming the network is just latent, and not lossy, this might do what you need, too.