Win2K8 Server MPIO iSCSI failover not working

iscsimpiowindows-server-2008

I have a desire to pass iSCSI traffic between my Windows 2K8 Server lab system and my NetApp filer across two separate network stacks*.

My configuration is as follows:

  • one Win2K8 server with the iSCSI software initiator installed, the MPIO component installed, and two network interfaces: 192.168.201.85/24 and 192.168.202.85/24
  • one NetApp filer with a LUN published to the Windows server's IQN, and two interfaces: 192.168.201.200/24 and 192.168.202.200/24
  • two separate switches, one for 192.168.201.0/24 and one for 192.168.202.0/24. Both are flat (un-VLAN'd) and are not connected to any other network equipment — including each other.

I have configured the MPIO component to register the iSCSI software initiator "adapter" class.

Then I have gone into the the iSCSI initiator control panel and added both filer addresses as "targets" and run discovery against them. This shows the single LUN available.

I have then "logged on" to the LUN twice, selecting a different "source" IP address for each connection. Both connections have "re-connect at boot" checked and "MPIO" checked.

When I examine the target, I see two connections to the target, one for each IP address that the NetApp is using.

When I examine my persistent connections, I see two connections, one for each IP address that the NetApp is using.

(I should mention at this point that I have tested both filer IPs by demonstrating a single connection to each IP, mounting and then using a drive across that IP.)

I then go into my Disk Mangler and set up the partition on the LUN, and mark it Online. The disk works as expected.

Now I go into the new disk's properties, and click the MPIO tab. I can see two connections in use for this disk. However I don't know how to associate the connection I see in this tab with the connection I see in the iSCSI initiator screens — so while I presume there's one connection for each connection in the iSCSI initiator screen, I can't prove it.

In the MPIO tab, I have several options.

I have reduced the timers all down to 1 second each and enabled the Path Verification. So my understanding of these settings mean:

  • each second the Windows server will verify that the path is valid ie the remote target IP is answering properly;
  • the server will retry only once after a failure is detected, one second after the failure is detected;
  • the server will mark as invalid and remove the path one second after a failure.

Regarding redundancy, there are a couple things I have tried:

  • If I set up both connections as Active/Active and select Round Robin use, the disk works. If I set up a copy operation on the disk and simulate a network failure by pulling one of the network cables out, the connection stops for ~30 seconds and then keeps going.
  • If I set up the connection as a Failover-only by marking one connection as Standby/Passive and selecting Failover-only, again the connection works. (Interestingly, disk-to-disk copies appear to consistently flow at about twice the speed of the Round Robin, but anyways.) If I simulate a failure by pulling the standby cable out, the connection stops for about 1 second and then keeps going. If I simulate a failure by pulling the Active cable out, the connection stops — and I can't ping the filer across either wire. Eventually the OS tells me the disk has failed. The network stays in this state for several hours (after which time I got tired of waiting on it and rebooted the server).

I did some research and found a Microsoft KB 968287 which talks about failover not completing because of a counter error in the MPIO.sys driver in Win2K8 and Vista, but installing this hotfix has not changed anything that I can see.

All this makes me suspect I'm missing something fundamental. Am I doing this wrong?

The real goal here is to provide a more-reliable iSCSI transport over which to run VMs and mount Exchange stores on my Hyper-V cluster. We do know that Exchange in particular will unmount information stores very quickly if a disk hiccup is detected, so we were hoping that MPIO would permit data to flow even if one path failed.


*= We currently have a single iSCSI switch, but when that started to misbehave we had to take down our entire world in order to flash the firmware on the one switch. Therefore we want two totally isolated network paths — NICs, switches, and interfaces on the other end — so that we can take half of them out of service at any given time for maintenance without killing the world.

Best Answer

My understanding is that on 7 mode in Netapp, each LUN is going to have a preferred path, even if you're sending IO over two paths. What you're effectively doing is sending every second IO through an additional hop while the other controller redirects it to the primary controller for that LUN through the interconnect. The 30 second delay you're observing is likely the time it takes to accomplish a hard cluster node takeover.

8 mode is barely more than a toy right now (and unless you feel like alpha testing for Netapp, 7 mode is the only real option), but will fix this problem by virtualizing a few layers of the filer, including the ethernet interfaces.

If you want a truly active active box for iSCSI or any other block protocol, you don't want a Netapp. There's no guarantee for takeover time, and I've seen it take a lot longer than 30 seconds in the past.