I've got any interesting problem where I am getting packet loss between multiple servers in the same network. This is happening to about 15 hosts but I'll condense it to just three below.
Firstly some topology. Identical on all machines.
hosta - 10.20.30.1; Debian Lenny 5.0.5 2.6.26-2-686 #1 SMP, firmware-bnx2 0.14+lenny2
hostb - 10.20.30.2; Debian Lenny 5.0.5 2.6.26-2-686 #1 SMP, firmware-bnx2 0.14+lenny2
hostc - 10.20.30.3; Debian Lenny 5.0.5 2.6.26-2-686 #1 SMP, firmware-bnx2 0.14+lenny2
lspci gives me…
Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)
All of the servers plug into a Cisco 2900XL. I've since changed that to a TeloSystems switch we use in the field to make sure it wasn't the Cisco.
The servers are all IBM x3550's and x3560's (pre-M1/M2).
Now for some testing… I'll only paste one side of the tests to save space but the results are 100% identical if I use other hosts.
root@hosta:~# ping -i 0.5 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 49542ms
rtt min/avg/max/mdev = 0.097/0.157/5.533/0.540 ms
root@hosta:~# ping -i 0.1 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 9941ms
rtt min/avg/max/mdev = 0.089/0.105/0.170/0.017 ms
root@hosta:~# ping -i 0.05 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 5167ms
rtt min/avg/max/mdev = 0.088/0.096/0.170/0.016 ms
root@hosta:~# ping -i 0.01 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 79 received, 21% packet loss, time 960ms
rtt min/avg/max/mdev = 0.088/0.095/0.126/0.009 ms
root@hosta:~# ping -i 0.025 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 2800ms
rtt min/avg/max/mdev = 0.087/0.097/0.120/0.006 ms
root@hosta:~# ping -i 0.02 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.085/0.096/0.164/0.017 ms
root@hosta:~# ping -i 0.019 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 99 received, 1% packet loss, time 1995ms
rtt min/avg/max/mdev = 0.085/0.092/0.112/0.014 ms
root@hosta:~# ping -i 0.015 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 92 received, 8% packet loss, time 1614ms
rtt min/avg/max/mdev = 0.086/0.099/0.161/0.016 ms
root@hosta:~# ping -i 0.0125 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 84 received, 16% packet loss, time 1198ms
rtt min/avg/max/mdev = 0.083/0.093/0.136/0.012 ms
If I connect my MBP to the switch (both) I get no packet loss when running the above tests to it.
This only appears to have stared happening since we upgraded from Etch to Lenny about 9 months ago.
My next step is to burn an Ubuntu Live CD to do some testing from a different newer kernel.
Any help/ideas/pointers would be appreciated.
Best Answer
Here's Serverfaults official answer on the matter: http://blog.serverfault.com/post/broadcom-die-mutha/