Hyper-V Hosts Inconsistent Network Connectivity – Troubleshooting

hyper-v-server-2012networkingping

I am having trouble with two of my servers that stopped being able to communicate (in a weird way).

Servers are both Microsoft Hyper-V Server 2012 (the ones without GUI).

Name: HVS1
Ip Address: 10.0.0.11
Hosts a VM called servidor

Name: HVS2
Ip Address: 10.0.0.12
Hosts a VM called WMS-1

Each was replicating VM's from the other, this was working fine until about a month ago.

My tests for this question here ALL have these characteristics:

  1. both Firewalls are disabled (with netsh advfirewall set allprofiles state off) so I know that these are not firewall issues.

  2. I'm always pinging by IP address (although I have hosts entries for their names in each server, so it's not a DNS issue)

  3. I'm always pinging in both directions, so either both work or neither works. I don't have any cases of pings working only one way.

  4. All hosts are configured to respond to Ping.

  5. Everything is IP v4

Things I've tried:

  1. I can't ping between 10.0.0.11 and 10.0.0.12. This is the basic thing I'm trying to solve, as I expect if I can get this connectivity working, the rest of my problems will go away.

  2. I can ping from their VM's to the host and back. So, servidor can ping HVS1.

  3. I tried a different hardware switch and it doesn't make any difference.

  4. The higher level services also don't work: Hyper-V manager can't connect between the two hosts, gives an RPC error (RPC Service is running).

  5. RDP into HVS1 works, as long as it's not coming from HVS2, but it is very slow, with very frequent 10 second lags. I don't notice anything else slow in the server.

  6. Ping from my laptop to HVS2 works fine.

  7. Ping from my laptop to HVS1 gives 77% loss. Lots of packets timeout. This explains the RDP lags. Faulty NIC or cable on HVS1, I hear you think? But…

  8. Ping from my laptop to servidor works perfectly. Note that this is a VM on the HVS1 host, so it's going through the same NIC and cable as above… So???

  9. Ping from HVS2 to HVS1 is 100% loss. The same in the opposite direction.

  10. Ping from servidor to wms-1 works fine. So VM's from one host to the other can ping, but hosts can't.

So, can someone please explain to me how a connectivity can work across the same physical connection, perfectly in some cases, imperfectly in others, and not at all in others?

And any suggestions for what I can try next? Thanks!

UPDATE – Some extra details requested in comments:

C:\>netsh int tcp show global Querying active state...

TCP Global Parameters  
----------------------------------------------  
Receive-Side Scaling State          : enabled  
Chimney Offload State               : disabled  
NetDMA State                        : disabled  
Direct Cache Access (DCA)           : disabled  
Receive Window Auto-Tuning Level    : normal  
Add-On Congestion Control Provider  : none  
ECN Capability     : enabled  
RFC 1323 Timestamps                 : disabled  
Initial RTO   : 3000  
Receive Segment Coalescing State    : enabled

Looking at my adapters I find something I wasn't expecting – for some reason there seems to be a new name for the adapter there, Ethernet 4. I don't remember this numbering, it sounds like something got re-done by Windows itself and a new number was given.

PS C:\> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       
----                      --------------------                    ------- ------         
Ethernet 4                Realtek PCI GBE Family Controller            21 Up            
vEthernet (External)      Hyper-V Virtual Ethernet Adapter #2          23 Up           

It's likely that the changing to this "new" adapter caused the different behaviour in terms of LSO:

PS C:\> Get-NetAdapterLso

Name                           Version         V1IPv4Enabled  IPv4Enabled  IPv6Enabled  
----                           -------         -------------  -----------  -----------  
Ethernet 4                     LSO Version 1   True           False        False  
vEthernet (External)           LSO Version 2   False          True         True  

Driver information:

PS C:\> Get-NetAdapter -Physical | fl

Name                       : Ethernet 4  
InterfaceDescription       : Realtek PCI GBE Family Controller  
InterfaceIndex             : 21  
MacAddress                 : 00-14-D1-1D-57-11    
MediaType                  : 802.3  
PhysicalMediaType          : 802.3  
InterfaceOperationalStatus : Up 
AdminStatus                : Up  
LinkSpeed(Gbps)            : 1  
MediaConnectionState       : Connected  
ConnectorPresent           : True    
DriverInformation          : Driver Date 2011-10-20 Version 8.1.1020.2011 NDIS 6.30  

I tried disabling Lso completely for both adapters, but the problem seems to persist 🙁

UPDATE 2: I noticed I had a spare NIC, exactly the same as the one already there, and tried swapping it. Problem persists. I am suspecting the Hyper-V network stack is somehow corrupted…

Best Answer

Answering my own question...

After some further diagnostic based on helpful comments received, and an attempt to use a new NIC, I ruled out hardware causes.

A bit of studying of Hyper-V networking brought to my attention the fact that Hyper-V doesn't connect the Host to the network directly, instead it diverts it through the virtualization networking stack. So the mysterious behaviors described above aren't that mysterious, they are consistent with a problem in my Management Host virtual adapter.

This can be seen with the Adapter list on HVS1:

PS C:\Users\Administrator> Get-VMNetworkAdapter -all

Name                  IsManagementOs VMName    SwitchName MacAddress   Status IPAddresses
----                  -------------- ------    ---------- ----------   ------ -----------
External_InternalPort True                     External   00155DC08706 {Ok}
Network Adapter       False          servidor  External   00155DC08705 {Ok}   {10.0.0.10, fe80::a40d:a9b3:6a6c,...
Network Adapter       False          vm-linux2 External   00155DC08708        {}
Network Adapter       False          Win7Eval  External   00155DC08709        {}
Network Adapter       False          wms-1     External   00155DC08707        {}

The problem is with the one called External_InternalPort which was created automatically by Hyper-V with IsManagementOS set to true, when I ticked that checkbox saying that this adapter could be shared by the Host operating system.

Compare this with the list from HVS2:

PS C:\Windows\system32> Get-VMNetworkAdapter -all

Name                  IsManagementOs VMName         SwitchName MacAddress   Status IPAddresses
----                  -------------- ------         ---------- ----------   ------ -----------
External_InternalPort True                          External   50465DB2CA1C {Ok}
Network Adapter       False          servidor       External   00155DC08705        {}
Network Adapter       False          SuiteCRM       External   00155DC08705        {}
Network Adapter       False          Ubuntu Desktop External   00155DC08706 {Ok}   {}
Network Adapter       False          vm-linux2      External   00155DC08708        {}
Network Adapter       False          wms-1          External   00155DC08707 {Ok}   {10.0.0.21, fe80::d920:9f00:59de:...

So my problem turned out to be that duplicate MAC address 00155DC08706!

Note that some of the other duplicates aren't problematic, since several of these are VM's replicating between themselves. But a duplicate with the ManagementOS adapter is problematic (by the way, I have no idea how it came to be...). I recognize now that the Ubuntu Desktop machine was created around the time when my problems began, I just never associated the events.

Turning off this machine automagically got my servers' connectivity to behave normally again.

Further work I need to do now:

  • fix duplicate MAC Address
  • differentiate MAC address pools' configuration on both servers to avoid future accidents

Thanks for help received.

Related Topic