Linux – 100% packets dropped on first RX queue on 3/5 raid6 iSCSI NAS devices using intel igb (resolved)

debianiscsilinuxnetworkingsoftware-raid

Edit : The issue is resolved. The Queues in question have been used for Flow Control Packets. Why the igb driver propagated FC packets up to have them dropped (and counted) is another question. But the solution is, that there is nothing dropped in a way that data got lost.

Thank you very much, syneticon-dj, your pointer to dropwatchwas gold!

=== original question for further reference ===

we have the following situation:

System:
The server in question is a dell poweredge with 4 quad-core xenon cpus, 128GB ECC RAM and is running debian linux. The kernel is 3.2.26.
The interfaces in question are special iSCSI cards with four interfaces each using Intel 82576 Gigabit Ethernet Controller.

Background: On one of our servers a lot NAS (Thecus N5200 and Thecus XXX) are connected using iSCSI on dedicated 1GB/s interfaces. We have 5 cards with 4 ports each. The NAS filers are connected directly, no switch in between.

Two weeks ago we managed to clear four NAS filers and used them to build a raid6 using mdadm on them. Using LVM this allows us to dynamically create, shrink and/or grow storage for our various projects instead of searching all our NAS filers for free space every now and so often.

However, we got a lot of overruns on pretty much every interface and a lot packets got dropped. Investigations showed, that the default settings for the networking stack(s) had to be increased. I used sysctl to tweak all settings until no more overruns occurred.

Unfortunately the interfaces that are used for the NAS raid still drop a lot of packets, but only RX.

After searching (here, google, metager, intel, anywhere, everywhere) we found information about the intel igb drivers to have some problems and that some work has to be done.

Thus I downloaded the latest version (igb-4.2.16), compiled the module with LRO and separate queues support, and installed the new module.

All 20 (!) interfaces using this driver now have 8 RxTx queues (unpaired) and have LRO enabled. The concrete options line is:

options igb InterruptThrottleRate=1 RSS=0 QueuePairs=0 LRO=1

irqbalancer is nicely distributing the queues of all interfaces and everything works splendid.

So why am I writing? We have the following odd situation and simply can not explain it:

Three of the five interfaces for the NAS raid (We have added one spare NAS, and the raid should be grown once mdadm has finished its current reshape) show a massive amount (millions!) of packet drops.

Investigations with ethtool now show, thanks to the new multiple-queue-enabled drivers, that each interfaces uses one queue massively, this will be the reshape we guess.

But three use another queue with millions of incomming packets, which all get dropped. At least showed investigations utilizing 'watch', that the packet numbers on these queues correlate with the dropped packages.

We changed the MTU on the NAS and interfaces from 9000 down to 1500, but the packet drop rate increased and the mdadm performance went down. Thus it does not look like an MTU problem. Further the network stack has insane amounts of memory to its disposal, this shouldn't be a problem either. backlogs are large enough (huge in fact) and we are completely at sea.

Have example output here:

~ # for nr in 2 3 4 5 9 ; do eth="eth1${nr}" ; echo " ==== $eth ==== " ; ethtool -S $eth | \
> grep rx_queue_._packet | grep -v " 0" ; ifconfig $eth | grep RX | grep dropped ; \
> echo "--------------" ; done
==== eth12 ==== 
    rx_queue_0_packets: 114398096
    rx_queue_2_packets: 189529879
          RX packets:303928333 errors:0 dropped:114398375 overruns:0 frame:0
--------------
==== eth13 ==== 
    rx_queue_0_packets: 103341085
    rx_queue_1_packets: 163657597
    rx_queue_5_packets: 52
          RX packets:266998983 errors:0 dropped:103341256 overruns:0 frame:0
--------------
==== eth14 ==== 
    rx_queue_0_packets: 106369905
    rx_queue_4_packets: 164375748
          RX packets:270745915 errors:0 dropped:106369904 overruns:0 frame:0
--------------
==== eth15 ==== 
    rx_queue_0_packets: 161710572
    rx_queue_1_packets: 10
    rx_queue_2_packets: 10
    rx_queue_3_packets: 23
    rx_queue_4_packets: 10
    rx_queue_5_packets: 9
    rx_queue_6_packets: 81
    rx_queue_7_packets: 15
          RX packets:161710730 errors:0 dropped:4504 overruns:0 frame:0
--------------
==== eth19 ==== 
    rx_queue_0_packets: 1
    rx_queue_4_packets: 3687
    rx_queue_7_packets: 32
          RX packets:3720 errors:0 dropped:0 overruns:0 frame:0
--------------

The new spare drive is attached to eth15.
As you can see, there are no overruns and no errors. And the adapters report, that they did not drop a single packet. Thus it is the kernel throwing data away. But why?

edit: I forgot to mention that eth12 to eth15 are all located on the same card. eth19 on another.

Does anybody have ever witnessed such strange behaviour, and was there a solution to remedy the situation?

And even if not, does anybody know a method with which we could at least find out which process occupies the dropping queues?

Thank you very much in advance!

Best Answer

You have enough interfaces to build a workgroup switch with. As this configuration is not employed as often and thus not tested as thoroughly, expect oddities coming from that alone.

Also, as your setup is quite complex, you should try isolating the issue by simplifying it. This is what I would do:

  1. rule out the simple cases, e.g. by checking the link stats by issuing /sbin/ethtool -S <interface> to see if the drops are a link-related problem
  2. as the NICs are making use of interrupt coalescing, increase the ring buffer and see if it helps matters
  3. use dropwatch to get a better idea if any other buffers could be increased
  4. disable multiqueue networking again - with 20 active interfaces there hardly will be a situation where multiple queues per interface would gain any performance and from your description it might be a queuing-related problem
  5. reduce the number of interfaces and see if the problem persists
  6. if nothing else helps, post a question to the Kernel netdev mailing list