How Processors Handle Data Rates of 10 Gigabits per Second or More

speed

I do not know if it is the right place to ask. It may be a very silly question. I assume that some processor has to process data frames for switching/routing. Modern processors have speed of few GHz. How do they handle data that is coming at a rate faster than they operate?

Best Answer

You are totally correct, if we have to use an instruction cycle per bit then 10Gbps would be unachievable. So the first thing to note is that we handle a word per CPU instruction -- 64 bits.

Even then the worst thing we can do for performance is to have the CPU access all of the words of a packet. Thus the focus on "zero-copy" handling of packets. Some of that trickery is in the interfaces themselves: they have DMA ("Direct memory access") so that the ethernet controller chip copies the data into RAM; they calculate the checksums so that the CPU doesn't have to access all the words in the packet to do so. Some of it is in the data structure design: we are careful to align packet buffers so we can move them by changing the owndership of a page table entry. Some of it is just careful programming to ensure that packet data is accessed the least number of times, and preferrably not accessed at all until the receiving application program.

Once we've done all of this the next limitation is the overhead of handling packets one at a time. Thus there are a heap of "segmentation offload" features both in the ethernet controller and in the kernel so that we handle groups of packets. We even delaying retrieving data from the ethernet controller so that these groups are larger.

Finally we have special-case shortcuts, such as the kernel's sendfile() call which is an express path from disk to network using the minimal amount of work.

We can even special-case routing (the forwarding of packets from one interface to another) using the hardware features of the network interface cards and treating the PCI bus as a bus between the cards rather than getting the CPU involved. That can't be done in general purpose operating systems, but vendors like Intel provide software libraries to implement such features on their ethernet controllers.

Moving away from CPUs altogether, we can even build special-purpose routers where all forwarding tasks happen in hardware. Since the PCI bus would then be a limitation they run multiple parallel busses; or even multiple parallel busses to multiple parallel crossbar switch assemblies. At one end of the market a small TCAM-based ethernet switch would be one example; at the other end of the market the Juniper M40 would be a canonical design.

A typical switch will start to receive a packet, look up the destination address in the TCAM, attach a tag with the egress-port to the packet, and then DMA the still-incoming packet to the egress port's controller. Note that if the output port is congested then all that can be done on this simple switch is to throw away the ingress packet. Thus simple switches do not make a good choice for when links change speed and some queueing is desirable. Of course more sophisticated switches exist, for which you pay more.

A typical router will receive a packet and hold it in a short queue. The destination IP address will be looked up in static RAM, the packet will then be exploded into cells to reduce latency, and each cell send to a cross-bar switch to the egress card. That card will reassemble the cells into a packet, and queue the packet out the egress interface. The queuing on the egress interface can be sophisticated.