Active directory member servers cannot locate domain controller

active-directorydomain-controller

Several hours ago, a handful of our member servers became unable to authenticate against the two domain controllers they should be using. The member servers and DC are located in the same datacenter, and are on a separate "site" in AD. Running DCDiag shows no problems, and we've confirmed that the servers and DCs have network connectivity with each other. Running nslookup on the member servers shows the proper DC listed as the name server in each case.

LDAP authentication seems to be working, however, Kerberos authentication has stopped working. Basically, all of the key internal services have stopped.

Here are specifics on some of the problems we are having with member servers:

Exchange – Topology Service cannot find any domain controllers.
Therefore, the Exchange Information Store cannot start.

SharePoint – Authentication is failing at the IIS level and between
IIS and SQL (this farm has been up for mutliple years).

Additional troubleshooting:

NLTEST /DCLIST:domainname – No DC can be found to get a DC List

NLTEST /Server:Servername – Both DCs Complete Successfully.

NLTEST /DSGetDC:Domain – Commands complete sucessfully.

NLTEST /dsgetsite – Completes successfully.

GPUpdate – User cannot be found. No domain exists

Output of nslookup -type=SRV _kerberos._tcp.dc._msdcs.subdomain.mydomain.com on the exchange server:

Server:  colo-dc-001.subdomain.mydomain.com
Address:  10.11.2.20

_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
      priority       = 0
      weight         = 100
      port           = 88
      svr hostname   = branchf-dc-001.subdomain.mydomain.com
_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
      priority       = 0
      weight         = 100
      port           = 88
      svr hostname   = colo-dc-001.subdomain.mydomain.com
_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
      priority       = 0
      weight         = 100
      port           = 88
      svr hostname   = hq-dc-003.subdomain.mydomain.com
_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
      priority       = 0
      weight         = 100
      port           = 88
      svr hostname   = colo-dc-002.subdomain.mydomain.com
_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
      priority       = 0
      weight         = 100
      port           = 88
      svr hostname   = hq-dc-004.subdomain.mydomain.com
_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
      priority       = 0
      weight         = 100
      port           = 88
      svr hostname   = branchc-dc-002.subdomain.mydomain.com
_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
      priority       = 0
      weight         = 100
      port           = 88
      svr hostname   = branchm-dc-001.subdomain.mydomain.com
_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
      priority       = 0
      weight         = 100
      port           = 88
      svr hostname   = branchs-dc-001.subdomain.mydomain.com
branchf-dc-001.subdomain.mydomain.com   internet address = 10.10.2.22
colo-dc-001.subdomain.mydomain.com  internet address = 10.11.2.20
hq-dc-003.subdomain.mydomain.com    internet address = 10.1.2.20
colo-dc-002.subdomain.mydomain.com  internet address = 10.11.2.21
hq-dc-004.subdomain.mydomain.com    internet address = 10.1.2.21
branchc-dc-002.subdomain.mydomain.com   internet address = 10.5.2.21
branchm-dc-001.subdomain.mydomain.com   internet address = 10.6.2.21
branchs-dc-001.subdomain.mydomain.com   internet address = 10.7.2.22

We can RDP to any of the servers that are hosting the above services, but the services will not work.

System logs on the member servers include some error messages about not being able to find a DC.

So basically, the network seems to be up, and the DCs seem to be up, but member servers right there on the same network segment can't find them. Where should we look for the problem?

Best Answer

I'd start looking at DNS. This really smells like DNS to me.

Does it look like things are missing from the _msdcs.domain.com forward-lookup zone?

If you run a nslookup -type=SRV _kerberos._tcp.dc._msdcs.domain.com what are you getting back for output?

Sniff the traffic on either the DCs or the member servers when you're running your failing diagnostic commands and post the output here if the problem isn't glaringly obvious. The NLTEST /DCLIST:domain.com command, for example, should cause the client to emit some DNS looking for an LDAP server in its site, followed by a couple of RPC binds.

Related Topic