Linux – ethernet smp_affinity vs /proc/interrupts vs /sys/class/net/ethX/device

ethernetinterruptsirqlinux

My goal is to configure our CentOS ("free" RHEL) 5.x servers for custom low-latency network programs. I would like to experiment with binding ethernet NIC interrupt handling to the same CPU on which the program runs (to hopefully improve cache utilization). The first step in this process is to determine the NIC's IRQ.

Here is the contents of /proc/interrupts on one server (note that I deleted CPUs 2 through 14 for brevity):

           CPU0       CPU1       CPU15
  0:  600299726          0          0    IO-APIC-edge  timer
  1:          3          0          0    IO-APIC-edge  i8042
  8:          1          0          0    IO-APIC-edge  rtc
  9:          0          0          0   IO-APIC-level  acpi
 12:          4          0          0    IO-APIC-edge  i8042
 50:          0          0          0   IO-APIC-level  uhci_hcd:usb6, uhci_hcd:usb8
 58:       6644      25103          0   IO-APIC-level  ioc0
 66:          0          0          0   IO-APIC-level  ata_piix
 74:        221     533830          0   IO-APIC-level  ata_piix
 98:         35          0    2902361       PCI-MSI-X  eth1-0
106:         61         11       3841       PCI-MSI-X  eth1-1
114:         28          0      61452       PCI-MSI-X  eth1-2
122:         24       1586         22       PCI-MSI-X  eth1-3
130:       2912          0        337       PCI-MSI-X  eth1-4
138:         21          0         28       PCI-MSI-X  eth1-5
146:         21          0         56       PCI-MSI-X  eth1-6
154:         34          1          1       PCI-MSI-X  eth1-7
209:         23          0          0   IO-APIC-level  ehci_hcd:usb1
217:          0          0          0   IO-APIC-level  ehci_hcd:usb2, uhci_hcd:usb5, uhci_hcd:usb7
225:          0          0          0   IO-APIC-level  uhci_hcd:usb3
233:          0          0          0   IO-APIC-level  uhci_hcd:usb4
NMI:       7615       2989       2931
LOC:  600328144  600328099  600327086
ERR:          0
MIS:          0

Why are there multiple entries for "eth1" in the form of "eth1-X"?

Furthermore, the contents of "/sys/class/net/eth1/device/irq" is "90". But there's no 90 in the interrupt list above.

So let's say I look at just "eth1-0", which is IRQ 98. The contents of /proc/irq/98/smp_affinity is:

00000000,00000000,00000000,00000000,00000000,00000000,00000000,00008000

That's a list of numbers, rather than just one number.

So how do I set eth1's smp_affinity?

None of the online examples and documentation I could find mentioned any cases like this; they always have exactly one "ethX" entry in /proc/interrupts; the indicated interrupt matches /sys/class/net/ethX/device/irq; and there is only one number in /proc/irq/N/smp_affinity.

FWIW, I'll add that this application is extremely latency sensitive. To the point where we disable C-states and processor frequency scaling (because those features induce too much latency). Micro seconds make a difference here.

Edit: I stumbled across the following web page
http://www.kernel.org/doc/man-pages/online/pages/man7/cpuset.7.html
that, although it is about cpuset, it has a section titled "Mask Format", which I assume is the same as what I am seeing in the /proc/irq//smp_affinity file. Quoting:

This format displays each 32-bit word in hexadecimal (using ASCII
characters "0" – "9" and "a" – "f"); words are filled with leading zeros,
if required. For masks longer than one word, a comma separator is used
between words. Words are displayed in big-endian order, which has the
most significant bit first. The hex digits within a word are also in
big-endian order.

The number of 32-bit words displayed is the minimum number needed to
display all bits of the bitmask, based on the size of the bitmask.

Examples of the Mask Format:

   00000001                        # just bit 0 set
   40000000,00000000,00000000      # just bit 94 set
   00000001,00000000,00000000      # just bit 64 set
   000000ff,00000000               # bits 32-39 set
   00000000,000E3862               # 1,5,6,11-13,17-19 set

A mask with bits 0, 1, 2, 4, 8, 16, 32, and 64 set displays as:

   00000001,00000001,00010117

The first "1" is for bit 64, the second for bit 32, the third for bit 16,
the fourth for bit 8, the fifth for bit 4, and the "7" is for bits 2, 1,
and 0.

Best Answer

Why are there multiple entries for "eth1" in the form of "eth1-X"?

Because there are multiple tx/rx queues. These queues are often a hash of (local addr, port, remote addr, port) and some other stuff. Suppressing the multiple queues might make it easier to make your application more deterministic, assuming you have few traffic sources. Or you could look up the algorithm and avoid ephemeral ports, if that's easier.