Cisco – Output Drops on Serial interface: Better queueing or Output queue size

ciscorouter

On Internet edge routers speaking eBGP to multiple carriers and iBGP to one another, all interfaces on the LAN and WAN side are GE except for one Serial full-DS3 (~45Mbps) on each router. Although I think I'm hardly sending much traffic outbound on the serial interfaces — in the 3-10Mbps range — I see constant output queue drops (OQD). Is the likely explanation that there really is bursty traffic I'm not seeing as the load-interval is at the 30 second minimum and SNMP polling is averaging traffic over 5 minutes, so those won't illuminate the burstiness?

The platform is a Cisco 7204VXR NPE-G2. Serial queuing is fifo.

Serial1/0 is up, line protocol is up
  Hardware is M2T-T3+ pa
  Description: -removed-
  Internet address is a.b.c.d/30
  MTU 4470 bytes, BW 44210 Kbit, DLY 200 usec,
     reliability 255/255, txload 5/255, rxload 1/255
  Encapsulation HDLC, crc 16, loopback not set
  Keepalive set (10 sec)
  Restart-Delay is 0 secs
  Last input 00:00:02, output 00:00:00, output hang never
  Last clearing of "show interface" counters 00:35:19
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 36
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  30 second input rate 260000 bits/sec, 208 packets/sec
  30 second output rate 939000 bits/sec, 288 packets/sec
     410638 packets input, 52410388 bytes, 0 no buffer
     Received 212 broadcasts, 0 runts, 0 giants, 0 throttles
              0 parity
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
     515752 packets output, 139195019 bytes, 0 underruns
     0 output errors, 0 applique, 0 interface resets
     0 output buffer failures, 0 output buffers swapped out
     0 carrier transitions
   rxLOS inactive, rxLOF inactive, rxAIS inactive
   txAIS inactive, rxRAI inactive, txRAI inactive

24 hours later will show thousands of OQD. We do push out more traffic around 3am each day, so maybe there is some bursty traffic here I'm not giving enough weight towards.

Last clearing of "show interface" counters 1d01h
Input queue: 0/75/0/158 (size/max/drops/flushes); Total output drops: 12049

I'd like to push more outbound traffic on the DS3, but not with my concern on the OQD. The tier 2 ISP behind the DS3 has POPs that double as peering-points with 6+ tier 1's, so the idea is to get that traffic on-net with the client asap as opposed to our primary ISP on the GE who is a tier 1, but must work their way towards their peering exchanges. Inbound traffic is not a concern.

Is there a better queueing strategy than fifo in this situation? Looking at the Cisco docs on input & output queue drops, incrementing outbound queue size is not recommended as the packets are already on the router and it would be better to drop at input so TCP can throttle the app back. There's plenty of bandwidth on our GE links, so there's no really need to throttle the input. There are no policy-maps on these routers. 90% of outbound traffic comes from our HTTP responses; most of the rest from FTP and SMTP. The GE links push 50-200+Mbps.

Would you recommend any adjustments to the output queue size buffer? These serial interfaces are our backup links that I'd rather utilize more for the reason given earlier (if valid), but tempered with my BGP policies that attempt not to overload that serial interface (which appears very underloaded most of the time).

Best Answer

You're right, you wouldn't really see the burstiness easily on SNMP. 1GE can send 1.48Mpps, so it takes very very little time to congest the the 45Mbps, which can handle less than 75kpps.

If your ingress is 1GE and egress is 45Mbps, then obviously the congestion point of 45Mbps will need to drop packets. This is normal and expected. If you increase buffers you'll introduce more delay.
1GE takes 0.45ms to send 40 1500B IP frames, which is right now the amount of burst you can handle. However dequeueing them on the 45Mbps already takes 10ms.

If you don't have any acute problem, I would probably not do anything about it. But if some traffic is more eligible for dropping than other, then you should replace FIFO with class-based queueing. Say maybe you want to prioritize so that more ftp is dropped and less voip.
Then it'll also make more sense to add more buffering on the ftp traffic, as it's not really sensitive to delay.

If you want to try your luck with deeper buffers, something like this should suffice:

policy-map WAN-OUT
 class class-default
    fair-queue
    queue-limit 200 packets
!
interface Serial1/0
  service-policy output WAN-OUT

This would cause 50ms buffers on the Serial1 and would allow you to handle up-to 2.25ms burst from single Gige interface.