DFS Replication Error 9098 (tombstoned content)

dfsdfs-r

3 servers, DFS replication was working for 2 years. Recently 1 of the member servers started reporting this:

The DFS Replication service stopped replication on replicated folder projects at local path Z:\projects due to Error ID: 9098 (A tombstoned content set deletion has been scheduled). Event ID: 4004

No matter what I do, this server continues to report the same tombstoned error. Nothing is being replicated to/from this problem server.

I even created a new share on each of the 3 member servers. Then, created a new namespace and enabled DFS replication. 2 servers replicate without issue, but the 1 problem server still reports tombstoned error.

I tried removing/re-installing DFS replication role, still happens.

I'm at a total loss here, any ideas? Pings from problem server to others are fine. "Verify topology" checks out fine in DFS Manager.

Best Answer

Try the following:

  1. Look in Event Viewer and identify all the replication groups/folders that are giving the tombstone error. Once you have them identified, go into DFS Management GUI and completely delete the replication group associated with that folder. You do not need to delete the DFS Namespace for that folder, just the replication functionality of that namespace folder. If you have other replication groups in your DFS-R that do not get the 9098 errors, then you do not have to do this for these folders.

  2. Stop DFSR services (you may need to kill the service using the taskkill command if it hangs when it tries to stop).

  3. Give yourself permissions to the hidden System Volume Information folder. If you're account is under the domain admins group, you can simply add the security group. This folder exists on all servers that is a member of the replication group. In my situation, 2 of the 3 servers didn't show this folder as existing even when I enabled to see hidden folders. If this happens to you, the server is lying to you that it's not there. It is there. Don't listen to it. My suggestion is to download and use the 7-zip file manager. It will see the folder and will help you set the permissions to it as well as delete files that are longer than 256 characters, which is an issue if you do the next step from the command line). Note, after you set the permissions, it might tell you that you still don't have access to that folder. Just close out of 7-zip and open it back up. It should let you in that folder as well as its subfolders.

  4. Once you have access to that folder, go ahead and delete the DFSR folder that resides underneath it. You will want to do this on all servers that has the DFSR role installed and is a member to any replication groups. You can use the command line command "rmdir", but it fails to delete files/folders that are longer than 256 characters. This is why the 7-zip file manager is a better option to delete the DFSR folder under System Volume Information. However, there are instances where 7-zip is unable to delete a file or folder. If you run in that scenario, use the rmdir command in an elevated command prompt. Essentially, a combination of these two will eventually clear out everything you need to clear out.

  5. Turn DFSR services back on. This will begin the process of recreating the DFSR hash and virtual tree that you had just deleted.

  6. Recreate the replication group that you want.

  7. On the replication groups that you did not delete, you may get the warning: "The DFS Replication service initialized the replicated folder at local path and is waiting to perform initial replication. The replicated folder will remain in this state until it has received replicated data, directly or indirectly, from the designated primary member." If you do, what you need to do is run the command line to set one of the DFSR servers as the primary server for that replication group, and then once set - this is important - you will have to go in the DFS Management GUI, click on the replication group with the associated warning, select the connections tab, and then right click the the sending member that you just made as primary and choose "Replicate now..." This will initialize the replication and you will have to do this just that once for it to replicate here on out. You will need to do choose the "Replicate now..." option for each receiving member that the sending member/primary member server is attached to in that replication group.

  8. Wait about 5-10 minutes and run the dfsrdiag backlog command on each replicationgroup and see if a backlog for replication/sync gets created. Run this command each 5 to 10 minutes to see if the backlog file count value decreases. If it does, it's syncing/replicating.

P.S. If you are using DFS-R only for resilliency, it is not the best way to achieve it. Look at high available FileServer role inside the Failover Cluster, for example like here

Related Topic