Of ~140 PCs, a few PCs (no determined pattern) are consistently unable to resolve the AD DS domain name during boot and intermittently unable to resolve AD DS DNS names post-boot. This can be temporarily resolved by restarting the Windows service DNS Client
/ dnscache
and/or rebooting the PC until it works.
My diagnostic progress:
- Once the resolution is in place, both domain controllers are contactable (verified in multiple ways) and Group Policy does apply but some policies require a reboot, hence this problem.
- The NIC's DNS configuration (servers, etc) are correct.
- Command
nltest /DSQUERYDNS
outputsI_NetLogonControl failed: Status = 50 0x32 ERROR_NOT_SUPPORTED
. - Command
Test-ComputerSecureChannel
outputsTrue
. - Updating the network device
Realtek PCIe GBE Family Controller
's device driver from version 7.86.508.2014 / 2014/05/08 to version 7.107.323.2017 / 2017/03/23 didn't make a difference. \\<%logonServer%>\NETLOGON\
is accessible.- Enabling local policy
Computer Configuration\Policies\Administrative Templates\System\Logon\Always wait for the network at computer startup and logon
didn't make a difference. - No traffic is blocked by the firewall during reboot.
As far as I'm aware, this only started happening since the site migration onto a new, restricted, VLAN-ed network so I can't help but suspect the Sophos XG 210 UTM but that doesn't make sense because it was firewall- / routing-related then I'd expect the problem to be much more consistent and widespread.
Update 2017/07/07 16:26
My diagnostic progress:
- Updating the Sophos XG firmware from version 16.05.3 MR-3 to version 16.05.5 MR-5 didn't resolve the problem.
- Created a network firewall rule to allow LAN-to-any, PC's IP address-to-any using any ports / services didn't resolve the problem.
- Disabling IPv6 on the NIC and rebooting didn't resolve the problem.
- Executing elevated command
netsh int ip reset reset.log
and rebooting didn't resolve the problem. - Logging on using a freshly-generated local user profile didn't resolve the problem.
Update 2017/07/12 11:23
The problem has changed on the test PC that I'm using. Post-boot, pinging the AD domain name and any server hostname successfully resolves and transmits but RSoP still reports that computer-side (not user-side) Group Policy Infrastructure failed to apply because The specified domain either does not exist or could not be contacted
.
My diagnostic progress:
- Reconfiguring the NIC setting almost the exact same configuration (IP address, subnet mask, default gateway, and DNS servers) statically, rather than dynamically, consistently resolved the boot, computer-side Group Policy problem on two of the affected PCs. I'm going to leave this static config in place for a few days to see if it resolves the intermittent DNS resolution problem too.
Update 2017/07/13 13:28
The intermittent DNS resolution problem recurred
My diagnostic progress:
- Pinging FQDNs with trailing
.
s didn't make a difference. - Reconfiguring the NIC setting the exact same configuration (IP address, subnet mask, default gateway, DNS servers, and connection-specific DNS suffix) statically resolved the problem, albeit probably temporary.
- I've posted a V7 USB3-to-Ethernet adapter to test whether it's an incompatibility between the onboard NIC and the switches or something. Results tomorrow.
Update 2017/08/03 14:51:
10+ hours of diagnostics later, the root cause seems to be MAX RemoteManagement's / MSP Remote Management's RMM agent (likely sub-component Advanced Monitoring Agent Network Management) as we uninstalled it on a few affected PCs on 2017/07/25 and the problems haven't recurred since.
Best Answer
I'm fairly confident in saying that this was caused by MAX RemoteManagement's / MSP Remote Management's RMM agent as uninstalling it has resolved a wide variety of DNS- / network-related problems on different PCs in different locations.