“The RPC Server is unavailable” when replicating domain controllers

active-directorydomain-controller

I have two domain controllers:

DC1: Win2k3 R2
EGDC1: Win2k8 R2

When I try to replicate these two (via Manage Sites and Services and under NTDS Settings) by selecting Replicate Now, I get the error message The RPC Server is unavailable. It doesn't matter if I try this while remoted into DC1 or DC2.

According to this technet article, this is a problem with a machine being down. However, I can additionally have both domain controllers ping one another just fine so there is no DNS issue nor any connectivity issue. Both are on the same LAN and even on the same subnet, so no VPN/wifi/firewall/quirky issues like that should be a problem.

Additionally, I verified that the RPC service is running on both boxes.

What could the problem be and how would I fix it?

dcdiag results:

Directory Server Diagnosis

Performing initial setup:
   Trying to find home server...
   Home Server = EGDC1
   * Identified AD Forest.
   Ldap search capabality attribute search failed on server DC1, return value =
   81
   Got error while checking if the DC is using FRS or DFSR. Error:
   Win32 Error 81The VerifyReferences, FrsEvent and DfsrEvent tests might fail
   because of this error.
   Done gathering initial info.

Doing initial required tests

   Testing server: INF\EGDC1
      Starting test: Connectivity
         ......................... EGDC1 passed test Connectivity

Doing primary tests

   Testing server: INF\EGDC1
      Starting test: Advertising
         ......................... EGDC1 passed test Advertising
      Starting test: FrsEvent
         ......................... EGDC1 passed test FrsEvent
      Starting test: DFSREvent
         ......................... EGDC1 passed test DFSREvent
      Starting test: SysVolCheck
         ......................... EGDC1 passed test SysVolCheck
      Starting test: KccEvent
         ......................... EGDC1 passed test KccEvent
      Starting test: KnowsOfRoleHolders
         [DC1] DsBindWithSpnEx() failed with error 1722,
         The RPC server is unavailable..
         Warning: DC1 is the Schema Owner, but is not responding to DS RPC
         Bind.
         Warning: DC1 is the Schema Owner, but is not responding to LDAP Bind.
         Warning: DC1 is the Domain Owner, but is not responding to DS RPC
         Bind.
         Warning: DC1 is the Domain Owner, but is not responding to LDAP Bind.
         Warning: DC1 is the PDC Owner, but is not responding to DS RPC Bind.
         Warning: DC1 is the PDC Owner, but is not responding to LDAP Bind.
         Warning: DC1 is the Rid Owner, but is not responding to DS RPC Bind.
         Warning: DC1 is the Rid Owner, but is not responding to LDAP Bind.
         Warning: DC1 is the Infrastructure Update Owner, but is not responding
         to DS RPC Bind.
         Warning: DC1 is the Infrastructure Update Owner, but is not responding
         to LDAP Bind.
         ......................... EGDC1 failed test KnowsOfRoleHolders
      Starting test: MachineAccount
         ......................... EGDC1 passed test MachineAccount
      Starting test: NCSecDesc
         Error NT AUTHORITY\ENTERPRISE DOMAIN CONTROLLERS doesn't have
            Replicating Directory Changes In Filtered Set
         access rights for the naming context:
         DC=ForestDnsZones,DC=eg,DC=local
         Error NT AUTHORITY\ENTERPRISE DOMAIN CONTROLLERS doesn't have
            Replicating Directory Changes In Filtered Set
         access rights for the naming context:
         DC=DomainDnsZones,DC=eg,DC=local
         ......................... EGDC1 failed test NCSecDesc
      Starting test: NetLogons
         ......................... EGDC1 passed test NetLogons
      Starting test: ObjectsReplicated
         ......................... EGDC1 passed test ObjectsReplicated
      Starting test: Replications
         [Replications Check,EGDC1] A recent replication attempt failed:
            From DC1 to EGDC1
            Naming Context: DC=ForestDnsZones,DC=eg,DC=local
            The replication generated an error (1256):
            The remote system is not available. For information about network tr
oubleshooting, see Windows Help.

            The failure occurred at 2010-11-29 08:56:33.
            The last success occurred at 2010-10-05 01:10:06.
            1330 failures have occurred since the last success.
         [Replications Check,EGDC1] A recent replication attempt failed:
            From DC1 to EGDC1
            Naming Context: DC=DomainDnsZones,DC=eg,DC=local
            The replication generated an error (1256):
            The remote system is not available. For information about network tr
oubleshooting, see Windows Help.

            The failure occurred at 2010-11-29 08:56:33.
            The last success occurred at 2010-10-05 01:10:03.
            1330 failures have occurred since the last success.
         [Replications Check,EGDC1] A recent replication attempt failed:
            From DC1 to EGDC1
            Naming Context: CN=Schema,CN=Configuration,DC=eg,DC=local
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2010-11-29 08:57:15.
            The last success occurred at 2010-10-05 00:48:18.
            1330 failures have occurred since the last success.
            The source remains down. Please check the machine.
         [Replications Check,EGDC1] A recent replication attempt failed:
            From DC1 to EGDC1
            Naming Context: CN=Configuration,DC=eg,DC=local
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2010-11-29 08:56:54.
            The last success occurred at 2010-10-05 00:48:18.
            1330 failures have occurred since the last success.
            The source remains down. Please check the machine.
         [Replications Check,EGDC1] A recent replication attempt failed:
            From DC1 to EGDC1
            Naming Context: DC=eg,DC=local
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2010-11-29 08:56:33.
            The last success occurred at 2010-10-05 01:09:58.
            1331 failures have occurred since the last success.
            The source remains down. Please check the machine.
         ......................... EGDC1 failed test Replications
      Starting test: RidManager
         ......................... EGDC1 failed test RidManager
      Starting test: Services
         ......................... EGDC1 passed test Services
      Starting test: SystemLog
         ......................... EGDC1 passed test SystemLog
      Starting test: VerifyReferences
         ......................... EGDC1 passed test VerifyReferences


   Running partition tests on : ForestDnsZones
      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test
         CrossRefValidation

   Running partition tests on : DomainDnsZones
      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test
         CrossRefValidation

   Running partition tests on : Schema
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation

   Running partition tests on : Configuration
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation

   Running partition tests on : eg
      Starting test: CheckSDRefDom
         ......................... eg passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... eg passed test CrossRefValidation

   Running enterprise tests on : eg.local
      Starting test: LocatorCheck
         ......................... eg.local passed test LocatorCheck
      Starting test: Intersite
         ......................... eg.local passed test Intersite

Best Answer

It looks like it last replicated on 10-05, what changed then? My guess is you have some sort of mismatch on the SRV records in the DNS for the two DCs. AD replication needs more than just the A record that ping uses, so Ping can give you a false negative in regards to DNS health. Try setting both servers to the same DNS server, and restarting the netlogin service on both. Then Try the replication again.