Standard vSwitch static LAG with iSCSI hardware initiators – impossible configuration

iscsilagmpiovmware-esxivmware-vsphere

In one site, we have vSphere hosts with 2 10G NICs.

  • Both NICs are in one vSwitch
  • failover policy IP Hash for static LAG (this is to balance/failover VM traffic)
  • Both NICs (HPE 534FLR-SFP+/QLogic 57810) have hardware iSCSI initiators enabled
  • Each initiator is bound to a vmkernel port one-to-one.
  • One iSCSI subnet
  • Both HBAs can access all SAN targets across the switch stack

Switches are Cisco C3850 series and MPIO works like a charm – tested (failover on path level etc…).

We deployed similar configuration in another site a few days ago. Same vSphere configuration, same NICs. However this time this configuration does not work properly.

Initiators can only access Targets in the ports of the same switch (that initiator is connected to) but not in the ports of the other switch.

With tcpdump I can see that vmkernel ports do discovery to all targets (done by vSphere according to documentation), static discovery targets do appear (SAN sees that it has been poked), however paths are never created and esxcli shows 0x0004 error (transport error?) for targets in another switch. This is quite hard to investigate as well as we can't directly see HBA traffic. Software iSCSI works like a charm however (bound to same vmkernel ports).

Switches are Cisco Nexus this time (I'll update the model when I know it) and stacking is VCP (?) instead of C3850 native(?). Otherwise sites are mostly the same but IMHO differences are so minor not to make a difference. Just to point out some:

  • HPE Proliant Gen9 vs Gen10
  • vSphere 6.5 vs 6.7 – as our backup now supports 6.7, we'll update old site
    shortly.

I've searched VMware documentation and I've found nothing that says that converged networking shouldn't work. We consulted our networking partner but they didn't understand how it currently worked and thought that it shouldn't work at all.

Is this configuration normal or are we depending on some implementation quirk of C3850 that doesn't work on other switches? Or is there something obviously wrong with switches?

Best Answer

To answer my own question… LAG is not recommended/supported ("should not", "contraindicated" - weak wording) with port bonding:

  1. https://kb.vmware.com/s/article/2051307

LACP support is not compatible with software iSCSI multipathing. iSCSI multipathing should not be used with normal static etherchannels or with LACP bonds.

  1. https://kb.vmware.com/s/article/2038869

Software iSCSI Port Binding is also contraindicated when LACP or other link aggregation is used on the ESXi host uplinks to the pSwitch

Apparently Catalyst 3850 Stackwise works very differently (it works…) from Nexus Virtual Port Channel. Traffic never crosses inter-switch channel under normal conditions so hardware iSCSI never gets it's packages back from other switch's SAN ports. Solution is to switch back to port ID based balancing in vSwitch and disable LAG. Traffic balancing isn't as important with 10G (IP hash helped with easily overloaded 1G links).

There are some unsourced weak claims from Google that VCP requires LACP to work properly. We had some packet loss in Switch --> vSphere connection that disappeared when we disabled LAG and switched back to Port ID balancing. I was aware that vSwitch drops traffic that is coming from incorrect uplink (if VM is hashed to vmnic0 but traffic comes back from vmnic1, it's dropped) but I'm not sure if it applies to IP hash balancing. On the other hand documentation states that only IP-SRC-DST is supported on switch side. If VPC sent traffic to vSphere over switch-local interface instead of correct IP hash-based interface, it could be considered "incorrect" uplink and dropped. Again, disabling LAG worked fine.

Related Topic