Routing – ASICs vs x86 General-Purpose Routing and Switching

ciscocisco-catalystlinuxroutingswitching

SysAdmins often try to convince me that x86 general-purpose OS's can perform just as well as routers with low MHz CPUs and dedicated silicon (i.e., ASICs) at 1Gbps line rates. This thinking is carrying over into the SDN realm such as virtual switches in VMWare.

I think I intuitively understand the differences between the benefits of ASICs vs x86 in handling traffic particularly with respect to microbursts. Is it correct in assuming that ASICs for router or switch interfaces will outperform the use of an x86 CPU for all packet processing which will greatly suffer from CPU interrupts? I know the OS (Windows, Linux, or specialized) greatly contributes to the hardware's performance to route or switch too. And I know the x86 bus-speeds impose theoretical maximums to switching bandwidth especially once rates exceed 1Gbps.

How does the Catalyst 6500 Sup2T ASIC switching speed, for example, compare to the realistic x86 switching speeds found on general OSes or SDNs?
How does the Cisco 7200VXR-NPE-G2 switching speed, for example, compare to same…
How do typical router or switch latencies compare to general OSes performing the same function?

NOTE: I don't want to hear the merits of virtual switch placement or their role within a virtual and physical network. I also do not want to debate the merits of SDN for application time-to-deployment.

Best Answer

Is it correct in assuming that ASICs for router or switch interfaces will outperform the use of an x86 CPU for all packet processing which will greatly suffer from CPU interrupts?

It's hard to say specifically whether interrupts are a limitation, since we aren't naming specific CPU, operating system or router models in this part of your question. Overall, it's a safe generalization that general-purpose CPUs cannot touch the packet-switching performance of a well-designed ASIC. When I say performance, I'm talking about RFC 2544 metrics, such as no-drop packets-per-second forwarding rate (NDR), throughput, and latency.

That's not to say that there isn't a place for a CPU-based router; just that our life experiences tell us a CPU can't switch packets as quickly as an ASIC or FPGA. My conclusion about ASICs / FPGAs being faster than a multi-core CPU seems to be reinforced by this Q&A on Electronics.SE.

PCI-bus performance

I know the x86 bus-speeds impose theoretical maximums to switching bandwidth especially once rates exceed 1Gbps.

I'm not sure which bus restrictions you're referring to here, but the information you have could be somewhat outdated. The PCI Express bus used in most systems scales well above 10Gbps these days.

PCIe 2.0 uses an 8b/10b encoding scheme that penalized it approximately 20% for the PCI lane encoding overhead; before that encoding penalty, PCIe 2.0 delivers 4Gbps of raw bandwidth per lane. However, even with the 20% 8b/10b penalty, PCIe 2.0 x8 (8 PCIe lanes) squeezes out over 25Gbps; thus you can easily run a single 10GE adapter at bidirectional line-rate on a PCIe 2.0 x8 card.

PCIe 3.0 (used in Intel Ivy Bridge chipsets) uses 128b/130b encoding, which greatly improves PCI-bus efficiency, and doubles the per-lane bandwidth. Thus a PCIe 3.0 x8 card could deliver 63Gbps (8.0*8*128/132). This is nothing to sneeze at; you can safely pack two line-rate 10GEs on a single riser with those performance rates.

Cisco vs Vyatta performance

Caveat: I'm using vendor-supplied marketing material for all comparisions...

How does the Catalyst 6500 Sup2T ASIC switching speed, for example, compare to the realistic x86 switching speeds found on general OSes or SDNs?

This is a bit challenging because we're going to compare a fully-distributed switching system (Sup2T) to a centralized-switching system (Vyatta), so be careful interpreting the results.

Sup2T can forward at up to 60Mpps non-drop rate with features enabled. Reference: Catalyst 6500 Sup2T Architecture White Paper. Note that this is just a bare Sup2T system with no Distributed Forwarding Cards (DFC).^{Note 1}
I have found RFC 2544 test results for the Vyatta 5600 forwarding at up to 20.58Mpps non-drop rate, and 70Mpps if you can accept some drops. The NDR throughput was 72Gbps. Reference: Vyatta 5600 vRouter Performance Test (SDN Central). SDN Central registration is required to see the full report.

How does the Cisco 7200VXR-NPE-G2 switching speed, for example, compare to same...

The Vyatta blows an NPE-G2 out of the water, performance-wise; the NPE-G2 can do up to 2Mpps based on the Cisco NPE-G2 Datasheet. That's not really a fair comparison though given the age of the NPE-G2, vs a brand-new Intel 10-Core system packed with 10GE cards.

How do typical router or switch latencies compare to general OSes performing the same function?

That is a fantastic question. This paper indicates that Vyatta has higher latencies, but I would like to see this kind of testing done against the Intel E5 series CPUs.

Summary

Recap of a side-by-side comparison of Sup2T vs the Brocade Vyatta 5600:

Sup2T: 60Mpps NDR IPv4 with features (such as ACLs)
Vyatta and Intel E5: up to 20Mpps IPv4 NDR without features, or 70Mpps if you can accept small numbers of drops.

The Sup2T still wins in my opinion, particularly when you look at what you get with the Sup2T (distributed scale to 720Mpps, MPLS, countless MIBs, Layer2 and Layer3 switching, etc...).

If all you care about is raw switching performance, you can get respectable performance numbers from an x86 CPU. However, in real networks, it's not often just about who has the best drag-race numbers; most people need to worry about features (see: When should I focus on each value for switch assessment?). A big factor to consider is the number of features available, and how they integrate with the rest of your network.

It's also worth looking at operational feasibility of using x86-based systems in your company. I haven't used Brocade + Vyatta myself, but they could do a decent job building good show commands and support hooks into the box. If they indeed support enough features, and their system scales well in real networks, then go for it if that's what you like.

However, if someone goes cheap and just builds a linux box + bird / quagga + ACLs + qos, I would not want to be the guy supporting that solution. I have always maintained that the open-source community does a great job innovating, but the supportability of their systems pales when compared with mainstream network vendors (Arista / Cisco / Force10 / Juniper). One needs only to look at iptables and tc to see just how convoluted you can make a CLI. I occasionally field questions from people who look at the output of ip link show or ifconfig and get wierded out because the packet counters aren't right; typically the major network vendors do a much better job testing their counters, vs what I see in the linux NIC drivers.

End Notes:

^{Note 1} Nobody who cares about performance would ever buy a Sup2T and fail to populate the chassis with DFCs. The Sup2T can switch at 60Mpps, but a loaded chassis with DFCs scales to 720Mpps.

^{Note 2} The Vyatta test ran on dual-processor, 10-core Intel E5-2670v2 at 2.5Ghz per core; if we count a single core as two virtual cores (i.e. hyper-threading), that's a total of 40 cores for packet switching. The Vyatta was configured with Intel x520-DA2 NICs, and used Brocade Vyatta version 3.2.

Best Answer

PCI-bus performance

Cisco vs Vyatta performance

Summary

Related Solutions

VLAN Tagging and Routing – Troubleshooting Common Issues

Routing vs Forwarding vs Switching – Differences Explained

Related Topic