Azure – Investigating Sysvol replication issues when promoting domain controllers with DSC in Azure

active-directoryazuredfs-rdomain-name-systemdsc

We're using PowerShell DSC to automate the deployment of a number of small self contained environments, in these environments we are deploying 2 domain controllers and use DSC to setup the domain etc. This is all working fine except for the fact that once deployed and running, at some point the sysvol replication between the two DC's stops working (or it never started working). We see this error in the log:

The DFS Replication service initialized SYSVOL at local path
F:\SYSVOL\domain and is waiting to perform initial replication. The
replicated folder will remain in the initial synchronization state
until it has replicated with its partner .
If the server was in the process of being promoted to a domain
controller, the domain controller will not advertise and function as a
domain controller until this issue is resolved. This can occur if the
specified partner is also in the initial synchronization state, or if
sharing violations are encountered on this server or the sync partner.
If this event occurred during the migration of SYSVOL from File
Replication service (FRS) to DFS Replication, changes will not
replicate out until this issue is resolved. This can cause the SYSVOL
folder on this server to become out of sync with other domain
controllers.

Now I know how to fix this using ADSIEdit, that's not the issue. We're autoamting the deployment of these environments because we need to deploy lots of them and configure them identically, so I don't really want to have to go into each environment after deployment to fix this. We see this issue in every environment we deploy this way, so obviously something is amiss in how it's getting configured. So what I am really asking is if anyone has any ideas what could cause this, or where to start looking to try and find the root cause.

The AD deployment is pretty straight forward, we configure DC1 first, add some DNS entries, some group policy items, some user, groups and OU's, we then add in the second DC. The second DC does get all these objects, so the initial copy of the domain does work, but after that nothing in SYSVOL get's replicated.

Edit

We also see a single instance of the error below, ID 1202, at deployment time, which is odd given that DC prom succeeds and it is able to get teh inital copy of the domain;

The DFS Replication service failed to contact domain controller to
access configuration information. Replication is stopped. The service
will try again during the next configuration polling cycle, which will
occur in 60 minutes. This event can be caused by TCP/IP connectivity,
firewall, Active Directory Domain Services, or DNS issues.
Additional Information: Error: 1355 (The specified domain either does
not exist or could not be contacted.)

Best Answer

I think this is a DNS issue. You should not use 127.0.0.1 as primary DNS on these machines but instead use the real IP address and set the IP of the replica DC as secondary DNS server. This seems like the solution least people have problems with. This is an issue which is discussed over the years with various opinions, even Microsoft gives no clear answer, see this: link

Question

What is Microsoft’s best practice for where and how many DNS servers exist? What about for configuring DNS client settings on DC’s and members?

Answer

It depends on who you ask. We in MS have been arguing this amongst ourselves for 11 years now.