Advice on Active Directory design for multihomed servers

active-directorydomain-controllerdomain-name-systemnetworking

I've been tasked by a customer to come up with a working Active Directory design for a scenario with the following requirements (simplified, they are actually a lot worse):

  • There is a subnet for client systems.
  • There is a subnet for server systems.
  • The two networks are not connected.
  • Each server should have two network cards, one on the servers' network, the other one on the clients' network.
  • Traffic between clients and servers should only flow on the clients' network.
  • Traffic between servers should only flow on the servers' network.
  • This should apply to domain controllers, too.

Needless to say, this doesn't go very well with how Active Directory uses DNS to locate domain controllers; any possible approach would lead to one of the following scenarios:

  • DCs register their "client-side" IP address in the domain DNS; clients will talk with them using that address, but so will do servers, and AD replication traffic.
  • DCs register their "server-side" IP address in the domain DNS; servers will talk with them using that address and replication traffic will flow on that network, but clients will be unable to reach them.
  • DCs will register both IP addresses in the domain DNS; it's anyone's guess what any system will do to reach them.

Of course, these requirements are completely crazy and all of them can't be satisfied at the same time, unless using crazy solutions like splitting the DNS service on the two networks and populating its SRV records by hand (argh) or having the servers locate DCs using DNS and the clients locate DCs using WINS (double-argh).

The solution I came up with is having two DCs on the "servers" network and two DCs on the "clients" one, defining two AD sites and crossing the boundary between the two networks only with DC replication traffic. This will still require some DNS mangling, because each server will still have two network cards (apart from the two server-side DCs and purely back-end servers), but it has at least some chances to work.

Any advice, other than fleeing as fast as possible?

Best Answer

Let me begin by saying that I concur with many of the others -- either convince the client otherwise or run.

However, given your listed requirements (there are many unlisted), I can think of (and partially tested) at least the groundwork for making this happen.

There are several specific aspects that need to be considered.

  1. Active Directory Domain Services Replication
  2. DC Locator Process of Clients/Member Servers
  3. Name resolution and traffic for non-AD DS services

One and two have a lot in common -- in general we are at the whim of Microsoft on this one and have to work within the bounds of Microsoft's AD DS processes.

Number three we have a little bit of room to work with. We can choose the labels used for accessing services (files, database instances, etc.).

Here is what I propose:

Build your Domain Controllers (DC)

  • Likely at least two.
  • Each DC will have two NIC's, one in each IP network/AD DS site -- calling them clt and srv for now.
  • Only configure one NIC in each DC right now in the srv network.

Configure AD Sites and Services properly

  • srv site and subnet
  • clt site and subnet
  • uncheck "Bridge all site links" from Sites -> Inter-site Transports -> Right-click "IP"
  • delete the DEFAULTIPSITELINK if it exists (or if you renamed it) so there are no site links configured. Note that this is the unknown for me -- KCC will likely dump errors into the Directory Service event log saying the two sites (srv and clt) are not connected at varying intervals. However, replication will still continue between the two DC's as they can contact each other using the IP's in the same site.

Configure additional zone in AD DS Integrated DNS

  • If your AD DS domain is acme.local, create a second Primary AD Integrated Zone with dynamic updates enabled called clt.acme.local.

Configure the second NIC's on your DC's

  • These NIC's will be the NIC's in the clt network/site.
  • Set their IP's
  • Here is the magic part -- Adapter Properties -> IPv4 Properties -> Advanced -> DNS Tab -> Set the DNS suffix for this connection to clt.acme.local -> check Register this connection... -> Check Use this connection's DNS suffix... -> OK all the way through.
  • ipconfig /registerdns
  • This will register the clt NIC IP in the clt.acme.local zone -- providing a method for us to control which IP/network is used later.

Configure member server NIC's

  • Member server NIC's in clt site must have their DNS suffix and checkboxes set accordingly as well like above.
  • These settings can be used with static and DHCP, doesn't matter.

Configure DNS [stub] resolver behavior in the sites

  • DC's -> Configure DC srv NIC to use itself and other DC srv NIC IP. Leave DC clt NIC DNS empty (static IP is needed though). (DC DNS server will still listen on all IP's by default).
  • Member servers -> Configure member server srv NIC to use the DC srv site IP's. Leave member server clt NIC DNS empty (static IP can be used).
  • Clients/Workstations -> Configure DNS (either through DHCP or static) to use the DC's clt NIC IP's.

Configure mappings/resources appropriately

  • When servers talk to each other be sure to use .acme.local -> will resolve to srv network IP.
  • When clients talk to servers be sure to use .clt.acme.local -> will resolve to clt network IP.

What am I talking about?

  • AD DS replication will still occur as DC's can resolve each other, and connect to each other. The acme.local and _msdcs.acme.local zone will only contain the DC srv NIC IP's AD DS replication will only happen on the srv network.
  • DC locator process for member servers and workstations will function -- although there exist the possibility of delays at various parts of various AD DS processes when site is unknown, if multiple DC IP's are returned -- they will be tried, fail, and move on until one works. The effects on DFS-N have not been completely evaluated either -- but will still function.
  • Non AD DS services will function fine if you use the aforementioned .acme.local and .clt.acme.local labels as described.

I have not completely tested this as it is rather ludicrous. However, the point of this (wow, lengthy) answer is to begin evaluating whether or not it is possible -- not whether it should be done.

@Comments

@Massimo 1/2 Do not confuse multiple AD DS sites in the acme.local zone, and thus SRV records populated by DC's in those sites in the acme.local zone with needing SRV records in the clt.acme.local zone. The client's primary DNS suffix (and Windows domain to which they are joined) will still be acme.local. The client/workstations only have a single NIC, with primary DNS suffix likely derived from DHCP, set to acme.local.

The clt.acme.local zone does not need SRV records as it will not be used in the DC locator process. It is only used by clients/workstations to connect to member server's non-AD DS services using the member server IP's in the clt network. AD DS related processes (DC locator) will not use clt.acme.local zone, but the AD DS sites (and subnets) in acme.local zone.

@Massimo 3

There will be SRV records for both clt and srv AD DS sites -- just that they will exist in the acme.local zone -- see note above. The clt.acme.local zone does not need DC related SRV records.

Clients will be able to locate a DC fine. Client DNS servers point to the clt IP's of the DC's.

When DC locator process on the client kicks off

  • If the client knows its site the DNS question will be _ldap._tcp.[site]._sites.dc._msdcs.acme.local SRV. This will return back the site specific DC's that have SRV records registered.
  • If the client does not know its site the DNS question will be _ldap._tcp.dc._msdcs.acme.local SRV. This will return back all DC's. The client will attempt to bind to DC's LDAP until it finds one that responds. When the client finds one, it performs a site lookup to determine the client's site, and cache's the site in the registry so future DC locator instances happen quicker.

@Massimo 4

Ugh, nice catch. The way I see it there are two ways around this problem.

  1. The lesser impact (compared to 2 below) is to create an entry in the hosts file on the clients/workstations for dc1.acme.local and dc2.acme.local pointing to the clt NIC IP's of the DC's.

or

  1. Manually create the necessary SRV records in netlogon.dns file on each of the DC's. This likely will have some consequences on the server network. Member servers may at times communicate with the DC's on the clt network if this is configured.

All in all none of it is pretty, but that isn't necessarily the end goal. Maybe the client is just testing your tech chops. Plop it on their conference table and tell them "Here, this will work, but I am charging you 4x my normal rate to configure and support it. You can reduce it to 1.5x my normal rate -- .5x PITA charge, by doing [correct solution]."

As noted earlier, my recommendation is to convince otherwise or run. But it sure is a fun little exercise in ridiculous. :)