I am having trouble with two of my servers that stopped being able to communicate (in a weird way).
Servers are both Microsoft Hyper-V Server 2012 (the ones without GUI).
Name: HVS1
Ip Address: 10.0.0.11
Hosts a VM called servidor
Name: HVS2
Ip Address: 10.0.0.12
Hosts a VM called WMS-1
Each was replicating VM's from the other, this was working fine until about a month ago.
My tests for this question here ALL have these characteristics:
-
both Firewalls are disabled (with
netsh advfirewall set allprofiles state off
) so I know that these are not firewall issues. -
I'm always pinging by IP address (although I have
hosts
entries for their names in each server, so it's not a DNS issue) -
I'm always pinging in both directions, so either both work or neither works. I don't have any cases of pings working only one way.
-
All hosts are configured to respond to Ping.
-
Everything is IP v4
Things I've tried:
-
I can't ping between 10.0.0.11 and 10.0.0.12. This is the basic thing I'm trying to solve, as I expect if I can get this connectivity working, the rest of my problems will go away.
-
I can ping from their VM's to the host and back. So,
servidor
can ping HVS1. -
I tried a different hardware switch and it doesn't make any difference.
-
The higher level services also don't work: Hyper-V manager can't connect between the two hosts, gives an RPC error (RPC Service is running).
-
RDP into HVS1 works, as long as it's not coming from HVS2, but it is very slow, with very frequent 10 second lags. I don't notice anything else slow in the server.
-
Ping from my laptop to HVS2 works fine.
-
Ping from my laptop to HVS1 gives 77% loss. Lots of packets timeout. This explains the RDP lags. Faulty NIC or cable on HVS1, I hear you think? But…
-
Ping from my laptop to
servidor
works perfectly. Note that this is a VM on the HVS1 host, so it's going through the same NIC and cable as above… So??? -
Ping from HVS2 to HVS1 is 100% loss. The same in the opposite direction.
-
Ping from
servidor
towms-1
works fine. So VM's from one host to the other can ping, but hosts can't.
So, can someone please explain to me how a connectivity can work across the same physical connection, perfectly in some cases, imperfectly in others, and not at all in others?
And any suggestions for what I can try next? Thanks!
UPDATE – Some extra details requested in comments:
C:\>netsh int tcp show global Querying active state...
TCP Global Parameters
----------------------------------------------
Receive-Side Scaling State : enabled
Chimney Offload State : disabled
NetDMA State : disabled
Direct Cache Access (DCA) : disabled
Receive Window Auto-Tuning Level : normal
Add-On Congestion Control Provider : none
ECN Capability : enabled
RFC 1323 Timestamps : disabled
Initial RTO : 3000
Receive Segment Coalescing State : enabled
Looking at my adapters I find something I wasn't expecting – for some reason there seems to be a new name for the adapter there, Ethernet 4
. I don't remember this numbering, it sounds like something got re-done by Windows itself and a new number was given.
PS C:\> Get-NetAdapter
Name InterfaceDescription ifIndex Status
---- -------------------- ------- ------
Ethernet 4 Realtek PCI GBE Family Controller 21 Up
vEthernet (External) Hyper-V Virtual Ethernet Adapter #2 23 Up
It's likely that the changing to this "new" adapter caused the different behaviour in terms of LSO:
PS C:\> Get-NetAdapterLso
Name Version V1IPv4Enabled IPv4Enabled IPv6Enabled
---- ------- ------------- ----------- -----------
Ethernet 4 LSO Version 1 True False False
vEthernet (External) LSO Version 2 False True True
Driver information:
PS C:\> Get-NetAdapter -Physical | fl
Name : Ethernet 4
InterfaceDescription : Realtek PCI GBE Family Controller
InterfaceIndex : 21
MacAddress : 00-14-D1-1D-57-11
MediaType : 802.3
PhysicalMediaType : 802.3
InterfaceOperationalStatus : Up
AdminStatus : Up
LinkSpeed(Gbps) : 1
MediaConnectionState : Connected
ConnectorPresent : True
DriverInformation : Driver Date 2011-10-20 Version 8.1.1020.2011 NDIS 6.30
I tried disabling Lso completely for both adapters, but the problem seems to persist 🙁
UPDATE 2: I noticed I had a spare NIC, exactly the same as the one already there, and tried swapping it. Problem persists. I am suspecting the Hyper-V network stack is somehow corrupted…
Best Answer
Answering my own question...
After some further diagnostic based on helpful comments received, and an attempt to use a new NIC, I ruled out hardware causes.
A bit of studying of Hyper-V networking brought to my attention the fact that Hyper-V doesn't connect the Host to the network directly, instead it diverts it through the virtualization networking stack. So the mysterious behaviors described above aren't that mysterious, they are consistent with a problem in my Management Host virtual adapter.
This can be seen with the Adapter list on HVS1:
The problem is with the one called
External_InternalPort
which was created automatically by Hyper-V withIsManagementOS
set totrue
, when I ticked that checkbox saying that this adapter could be shared by the Host operating system.Compare this with the list from HVS2:
So my problem turned out to be that duplicate MAC address
00155DC08706
!Note that some of the other duplicates aren't problematic, since several of these are VM's replicating between themselves. But a duplicate with the ManagementOS adapter is problematic (by the way, I have no idea how it came to be...). I recognize now that the
Ubuntu Desktop
machine was created around the time when my problems began, I just never associated the events.Turning off this machine automagically got my servers' connectivity to behave normally again.
Further work I need to do now:
Thanks for help received.