Multihomed Clustered Windows Server 2008 Network Connectivity Problems After Failover

clusterdomain-name-systemgatewaymulti-homedwindows-server-2008

We have a shinny new multihomed Windows Server 2008 (64 bit) cluster exibiting some strange behavior.

The problem:

  • Everything works perfectly until we failover one of the cluster groups

  • Prior to a failover, internal clients can connect as well as external clients. And, all domain authentication works properly

  • Once we failover a cluster group, Internal clients in different subnets loose connectivity (as if the static routes had disappeared) and you can no longer log into the server using a domain account (Domain Controller is in different subnet)

  • All DNS lookups occur via the Public/Internet interface. It is as if the server(s) can no longer find/resolve the Internal/Domain DNS servers.

  • Rebooting fixes the problem until the next group failover

  • Setting the default gateway to the Internal network also works, at the extreme consequence of having to make static routes for the entire Internet (I don't have the time)

The network adapters are as follows:

  • Heartbeat Network (crossover cable between two servers)

  • Internal Network (Active Directory based Network w/ DNS no WINS)

  • Public Network (Internet Connection – Default Gateway – w/ DNS)

  • Microsoft Cluster Failover Virtual Adapter (this is hidden in most cases but you can see it when you do an "ipconfig /all")

Other information:

  • This system must provide services to both the Internal and Public networks

  • The Public/Internet connection is the default gateway

  • We have entered persistent static routes to several subnets off the Internal network

  • Each cluster group has a network name and associated IP address

  • The binding order of the network interfaces are:

    1 Internal

    2 Public

    3 Heartbeat

We're stumnped. We have used this configuration on older clustered Windows 2K clusters. We have also used this configuratin in standalone Windows 2K3 servers. Any suggestions would be greatly appreciated.

Todd

Best Answer

I think I have this exact same problem on a new 2008 R2 cluster with an equallogic, what is the solution? I have a microsoft case and they're pointing me to weak/strong host but it is not helping.

Here is solution for anything with broadcom NICs (and maybe others):

http://support.microsoft.com/default.aspx?scid=kb;EN-US;951037

You must disable rss/chimney/netdma. Resolved my problems immediately, after dell/ms support calls!