Distributed File System Replication (DFSR) between Server 2008 & Server 2008 R2 over Gateway to Gateway VPN causing extreme network latency

dfs-rlatencyvpn

The company I work for is in the process of setting up a branch office for one of our clients. We have 1.5 Mbps up/down T1s at each location, a Cisco RV042 router at each location hosting a Gateway to Gateway VPN tunnel between the two locations, and we have a SBS 2008 server at the main office and a Server 2008 R2 Standard server at the branch office. We have DFSR set up to replicate specific shares back and forth between the two offices. We are not using namespaces, just DFSR.

Our issue is, when the DFSR service is turned on on the branch server, the network latency increases by 100x to 200x. I measured latency just using a continuous ping running from my laptop to an external site. With DFSR off, average latency is ~11.5 ms. With DFSR on, it varies between ~1100 and ~2500 ms. The DFSR schedule is set for no replication from 6 am to 6 pm Monday through Friday, and full replication at all other times. The latency increase occurs even at times when the schedule says no replication should occur.

As a test, I switched the replication schedule from UTC to local time. I had assumed that UTC would query and use the UTC offset from the local time sources. (Thinking about it now, I have no idea why I assumed that.) I saw no immediate improvement, but I went and read through several DFSR related posts here before writing up this question, and now, several minutes later, I see that latency has dropped. Ping is now reporting between 300 and 400 ms and I am getting "Okay" results from the Speakeasy.net Speedtest.

So I guess this has become a two part question. Is this the kind of latency increase I should expect to see with DFSR? If not, what can I do to further tweak, tune or debug it?

Thanks for reading. If anything is unclear or you would like further information, please let me know.

Best Answer

You shouldn't see any latency increase during off hours, since there is no replication. Chances are, you just configured it wrong (which you seem to have figured out with your UTC comment). When you make changes, the changes are not immediately applied. They have to replicate to all members which could take hours, depending on your AD topology.


So I guess this has become a two part question. Is this the kind of latency increase I should expect to see with DFSR?

During time when replication is allowed, absolutely. 1.5Mbps is nothing and your servers can easily saturate it if there is a substantial amount of data to replicate.

If not, what can I do to further tweak, tune or debug it?

Double check your config and use tools like dfsdiag to see if there is a backlog or other issues.


Side Note: Measuring latency with ping is really not a troubleshooting technique. You should be monitoring it on your routers/switches at each site. 1.5Mbps isn't very much nowadays. Chances are your performance will be rather poor all day long, depending on how many people are in your office. You should take a baseline measurement over the course of a normal day on the routers and then compare.

Related Topic