Windows – Long pause when accessing DFS namespace

dfsnamespacesnetwork-shareserver-message-blockwindows

We've recently migrated our Windows network to use DFS for shared files. DFS is working well, except for one annoying problem: users experience a significant delay when they try to access a DFS namespace that they have not accessed for some time. I have tried to troubleshoot the issue but have not had any success so far, and I was hoping someone here may have some pointers to help resolve the problem.

Firstly, some background on our network:

The network uses a Windows 2008 functional level Active Directory domain with two Windows 2008 DCs and two DNS servers (one on each of the DCs). The network is DNS only – no WINS. All computers are located at the same site and connected by Gigabit Ethernet. We have approximately 20 Domain-based DFS namespaces in Windows 2008 mode, and each DFS namespace has two Windows 2008 DFS namespace servers (the same two servers for all namespaces). All namespace servers are in FQDN mode and all folder targets are specified using their FQDN. All computers are up-to-date with Service Packs and patches.

The actual folder targets (i.e. the SMB shares our DFS folders point to) are scattered across several file and application servers, all running Windows 2008 bar two application servers which run Windows 2003 R2, with no replication setup at all (e.g. all DFS folders currently only have one folder target).

Some more detail on the problem:

The namespace access delay is generally 1 – 10 seconds long and seems to occur when a particular computer has not accessed the requested namespace for approximately five minutes or more.

For example, if the user has not accessed \\domain.name\namespace1\ for more than five minutes and attempts to access \\domain.name\namespace1\ via Windows Explorer, the Explorer window will freeze for 1 – 10 seconds before finally resuming and displaying the folders that exist in \\domain.name\namespace1. If they then close the Explorer window and attempt to access \\domain.name\namespace1\ again within five minutes the contents will be displayed almost instantly – if they wait longer than five minutes it will go through the 1 – 10 second pause again.

Once "inside" the namespace everything is nice and snappy, it's just the initial connection to the namespace that is slow.

The browsing delays seem to affect all variants of Windows that we use (Windows 2008 x64 SP2, Windows 2003 R2 x86 SP2, Windows XP Pro x86 SP3) – it is possibly a bit worse in Windows XP / 2003 than in Windows 2008, but I'm not sure if the difference isn't just psychological.

Accessing the underlying folder targets directly exhibits no delay at all – i.e. if the SMB shares pointed to by DFS are accessed directly (bypassing DFS) then there is no pause.

During trouble-shooting I noticed that the "Cache duration" for all of our DFS roots is set to 300 seconds – 5 minutes. Given that this is the same amount of time required to trigger the pause I assume that this caching is somehow related, although I am unsure exactly what is cached on the client and hence what needs to be looked up again after 5 minutes have elapsed.

In trying to resolve the problem I have already tried / checked the following (without success):

  • Run dcdiag on both Domain Controllers – no problems found
  • Done some basic DNS server checks without finding any problems – I don't know how to check the DNS servers in detail, but I would add that the network is not exhibiting any other strange behavior that may point to a DNS problem
  • Disabled Anti-virus on clients and servers
  • Removing one of the namespace servers from a couple of namespaces – no difference

So that's where I'm up to – and I'm out of ideas. Can anyone suggest what may be causing the delays and/or what I should be trying next?

Best Answer

Well, we finally appear to have resolved this issue in our environment. For the benefits of others, here's what we discovered and how we fixed the problem:

To try and gain further insight into what was occurring before/during/after the delays we used Wireshark on a client machine to capture/analyse network traffic whilst that client attempted to access a DFS share.

These captures showed something strange: whenever the delay occurred, in between the DFS request being sent from the client to a DC, and the referral to a DFS root server coming back from the DC to the client, the DC was sending out several broadcast name lookups to the network.

Firstly, the DC would broadcast a NetBIOS lookup for DOMAIN (where DOMAIN is our pre-Windows 2000 Active Directory domain name). A few seconds later, it would broadcast a LLMNR lookup for DOMAIN. This would be followed by yet another broadcast NetBios lookup for DOMAIN. After these three lookups had been broadcast (and I assume timed out) the DC would finally respond to the client with a (correct) referral to a DFS root server.

These broadcast name lookups for DOMAIN were only being sent when the long delay opening a DFS share occurred, and we could clearly see from the Wireshark capture that the DC wasn't returning a referral to a DFS root server until all three lookups been sent (and ~7 seconds passed). So, these broadcast name lookups were pretty obviously the cause of our delays.

Now that we knew what the problem was, we started trying to figure out why these broadcast name lookups were occurring. After a bit more Googling and some trial-and-error, we found our answer: we hadn't set the DfsDnsConfig registry key on our domain controllers to 1, as is required when using DFS in a DNS-only environment.

When we originally setup DFS in our enviroment we did read the various articles about how to configure DFS for a DNS-only environment (e.g. Microsoft KB244380 and others) and were aware of this registry key, but had misintepreted the instructions on when/how to use it.

KB244380 says:

The DFSDnsConfig registry key must be added to each server that will participate in the DFS namespace for all computers to understand fully qualified names.

We thought this meant that the registry key has to be set on the DFS namespace servers only, not realising that it was also required on the domain controllers. After we set DfsDnsConfig to 1 on our domain controllers (and restarted the "DFS Namespace" service), the problem vanished.

Obviously we're happy with this outcome, but I would add that I'm still not 100% convinced that this is our only problem - I wonder if adding DfsDnsConfig=1 to our DCs has only worked around the problem, rather than solving it. I can't figure out why the DCs would be trying to lookup DOMAIN (the domain name itself, rather than a server in the domain) during the DFS referral process, even in a non-DNS-only environment, and I also know I haven't set DfsDnsConfig=1 on domain controllers in other (admittedly much smaller / simpler) DNS-only environments and haven't had the same issue. Still, we've solved our problem so we are happy.

I hope this is helpful to the others who are experiencing a similar issue - and thanks again to those that offered suggestions along the way.