Cisco – Basic multicast network performance problems

10gbethernetciscomulticastnetworkingperformance

I've been using mpong from 29west's mtools package to get some basic idea of multicast latency across various Cisco switches: 1Gb 2960G, 10Gb 4900M and 10Gb Nexus N5548P. The 1Gb is just for comparison.

I have the following results for ~400 runs of mpong on each switch (sending 65536 "ping"-like messages to a receiver which then sends back — all over multicast). Numbers are latencies measured in microseconds.

Switch           Average      StdDev      Min         Max
2960 (1Gb)       109.68463    0.092816    109.4328    109.9464
4900M (10Gb)     705.52359    1.607976    703.7693    722.1514
NX 5548(10Gb)    58.563774    0.328242    57.77603    59.32207

The result for 4900M is very surprising. I've tried unicast ping and I see the 4900 has ~10us higher latency than the N5548P (average 73us vs 64us). Iperf (with no attempt to tune it) shows both 10Gb switches give me 9.4Gbps line speed.

The two machines are connected to the same switch and we're not doing any multicast routing. OS is RHEL 6. 10Gb NICs are HP 10GbE PCI-E G2 Dual-port NICs (I believe they are rebranded Mellanox cards).

The 4900 switch is used in a project with tight access control so I'm waiting for approval before I can access it and check the config. The other two I have full access to configure.

I've looked at the Cisco document[1] detailing differences between NX-OS and IOS w.r.t multicast so I've got some ideas to try out but this isn't an area where I have much expertise.

Does anyone have any idea what I should be looking at once I get access to the switch?

[1] http://docwiki.cisco.com/wiki/Cisco_NX-OS/IOS_Multicast_Comparison

Edit (12 Jan 0945 GMT):

The 4900M has IGMP snooping enabled. I see no packet loss or errors on the counters on switch or servers.

I've had a look at CPU usage and it seems to sit at 94% when sending the ping messages. 75% is "Cat4k Mgmt LoPri", 6% is "IP Input", 3% is "Cat4k Mgmt HiPri"

Edit2 (12 Jan 1000 GMT):

CPU usage drops to 8% once I stop the messages.

Edit3 (13 Jan 0945 GMT):

Problem is layer 3 related. If I disabling the VLAN interface then latency drops to 72usec.

The config for the vlan is

vlan 110
 name 192.168.110/24-10Ge
end

...snip...

interface Vlan110
 description 10G Test Vlan
 ip address 192.168.110.4 255.255.255.0
 ip pim sparse-mode
end

Best Answer

Dave, your layer 3 prognosis could be right.

try this two documentations: http://www.cisco.com/en/US/products/hw/switches/ps663/products_tech_note09186a00804cef15.shtml

do you have your ip multicast routing enabled? disable this.

enable igmp snooping, enable mrouter - solution 3 (http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a008059a9df.shtml#solu1) on switch.

Syson (Toronto)

Related Topic