DFS keeps constantly replicating almost all files

dfs-rwindows-server-2003-r2windows-server-2008-r2

We have always had problems with DFS, but recently it has gotten worse with no apparent cause and it's becoming harmful. We have one master server and DFS connections to other four servers. The four severs don't modify any files, so all replications always propagate from the master to the four other servers. The replicated directory has about 900,000 files. In recent weeks, every time we check the DFS backlogs have hundreds of thousand of files. For instance, at the moment, the master server replicating about 700,000 files to three of the four servers while the fourth one is fine. Sometimes, only one is off, sometimes two and this time three. Also, it is never the same set of servers. It is inconceivable that something periodically touches all 900,000 files. The biggest change which happens is a scheduled update of several thousand files every six hours.

Does anybody have the same problem? Is it a known issue?

Update: (This is also an answer to some of the questions raised by Jeff Miles). The problem again happened few hours ago. I setup some probes in the morning and monitored the servers during the day, and at a seemingly random time, three backlogs ballooned to 3 million changes (which is more than the total number of files) within a minute. Nothing interesting in the DFS Event Log. Even no "started initial replication". Only a couple of "DFS connection lost or unresponsive" errors, but they happened about 10 minutes after the fact. Most likely because something choked on the huge backlogs. More importantly, the fourth server is fine. This indicates that the 3 million changes are most likely bogus. Also, I can't imagine anything changing that many files within such a short interval. Regarding the technical setup; it is a combination of Win2003R2 and Win2008R2. Could it be a problem?

Best Answer

First, verify your topology. Carefully review the replication connections under the "Connections" tab in your replication set properties:

  • The hub should have one outbound connection from itself to each of the remotes
  • Each of the remotes should have only one outbound connection, from itself back to the hub

I have seen full mesh topologies accidentally added that result in problems like you are seeing.

Other possible culprits: - Antivirus scanning or file indexing on one or more of the servers or one of their clients. (Opening a file updates its access time, which must then be replicated to all peers.) - One or more very large files jamming up replication - This should show in your DFS-R logs.

Finally, do you need DFS-R, or could a regular robocopy be used to keep the folders in sync?