Linux – VMWare vSphere packet loss

linuxnetworkingpacketlossvmware-esxi

I have a Dell Blade Enclosure with 14 blades and ESXi 5.5 on it. On blade 7 and 14 I have round about 65 % packet loss when I ping the ESXi Management interface. On all other blades there ist no packet loss.
The strange thing is, I have this packet loss when I ping from:

blade 1 to blade 7 = 65% packet loss

but, when I ping from blade 7 to blade 1 at the same time I ping from blade 1 to blade 7 I have no packet loss at all … not from 1 to 7 norr from 7 to 1

blade 1 to blade 7 and blade 7 to blade 1 = 0% packet loss

I have increased the Rx buffer in the ESX CLI but it doesn´t help.
When I do a esxtop and go to the networktab I don´t see any packetdrop


PORT-ID USED-BY TEAM-PNIC DNAME PKTTX/s MbTX/s PKTRX/s MbRX/s %DRPTX %DRPRX
33554433 Management n/a vSwitch0 0.00 0.00 0.00 0.00 0.00 0.00
33554434 vmnic0 - vSwitch0 22.73 0.04 44.50 0.10 0.00 0.00
33554435 Shadow of vmnic0 n/a vSwitch0 0.00 0.00 0.00 0.00 0.00 0.00
33554436 vmnic1 - vSwitch0 43.39 0.08 1.91 0.00 0.00 0.00
33554437 Shadow of vmnic1 n/a vSwitch0 0.00 0.00 0.00 0.00 0.00 0.00
33554438 vmk0 all(2) vSwitch0 3.66 0.01 1.91 0.00 0.00 0.00

The only thing I see that there are massive interrupts for 0xef:


VECTOR COUNT/s TIME/int COUNT_0 COUNT_1 COUNT_2 COUNT_3 COUNT_4 COUNT...
0xef 4435.5 1.0 309.2 275.6 402.9 30.5 339.0 15.3 ...

The switch logfile shows me, that the blade network card is flapping sometimes, when I say sometimes I mean once or twice a week for about 1 or 2 minutes.

I don´t think that`s the reason for that but I haven´t any idea anymore what can be the problem. Especially the ping and counter ping situation don´t make any sense to me.
Maybe you can help me?

Best Answer

I found the problem and a solution.

The packet loss was caused by identical MAC-addresses on different switch ports.

The virtual interface VMK0 Blade 7 had the same MAC as the hardware interface of Blade 11.

So what I've done is this: I gave the VMK0 interface a new MAC address. For that I go over the iDRAC interface of the Blade Enclosure and loging into the ESXi "GUI". I removed my both networkcards from the Administration interface, restart the Management Network, added them and restart the Management Network again. So I had some downtime but after that the MAC address was changed and there was no packet loss anymore.