Hyper-V – Guest and Host Dropping Network

hyper-vnetworking

This has been driving me absolutely crazy these past few weeks.

I've migrated a customer from VMWare to Hyper-V (don't judge.. reasons). The set-up is exactly the same as I've used elsewhere with no issues.

Host:

  • Dell R730
  • Hyper-V Server (Standalone, not full OS, based on Server 2019)
  • 8 Broadcom NICs, VMQ disabled at host level on all.
  • Latest available Broadcom NIC drivers from Dell in use
  • 4 of the NICs in a team connected to VSwitch-01. No Host OS management.
  • 2 of the NICs used for iSCSI
  • 1 NIC used for Host OS management, on VLAN 90
  • 1 Unused NIC
  • 96 GB RAM
  • 16 Cores
  • 3 VMs using vSwitch-01. VMQ Disabled.

Guest 1

  • Connected to vSwitch-01. VMQ Disabled.
  • 8GB RAM
  • 4 Virtual Processors
  • Win10 Education X64, all updates
  • VLAN 90

Physical Switch – HP 5406Rzl2 (J9850A)

  • Proper Enterprise Core Switch!
  • HP's "Trunk" (ie LAG) set up for the 4 ports the host NIC team is connected to.
  • Trunk is sending all required VLANs as "tagged"
  • Port connected to the management OS is "untagged" VLAN 90

Problem: The guests randomly lose network connection for around 5-10 minutes at random intervals. Nothing in any logs anywhere indicate cause, only symptoms.

In the past, this has been due to Broadcom NICs and VMQ. But I've disabled VMQ on the host, for each NIC, and for each guest, in the network settings within Hyper-V.

I've also tried:

  • Using a single NIC, with no teaming involved
  • Checking DNS/DHCP, etc
  • Deleting/recreating vSwitch
  • Removing/re-adding guest NIC

I can often force the issue by running Advanced IP Scanner on the guest. Once it reaches the upper limits of the /20 network (toward VLAN 90, potentially related), the guest drops connection for 5-10 mins.

Same problem occurs on all guests on the host, including Server 2022 and Server 2019.

Any pointers as to where else to look would be much appreciated!

Best Answer

I think I've fixed it. Seems like there was a spanning-tree mismatch on some of my switches. Once I manually set the root bridge priority, and made sure all switches were using the same protocol, I don't seem to be having any problems any more.

Related Topic