Cisco – Output drops on serial interface when service-policy applied

ciscopacket-lossrouter

Our core router is a Cisco 7206VXR with NPE-G2 running 12.4(20)T1. It has numerous interfaces connected to WAN links (DS3, ATM DS3, 2xT1, onboard Gig0/1), and feeds into the LAN core with 2xGig EtherChannel.

The LAN side clearly has more bandwidth than any of the WAN links. We address this with a service-policy on the WAN side that allocates bandwidth to each "interesting" class of service, based on an ACL or DSCP markings. So far, so good.

Our problem is that we're seeing excessive tail drops for one particular class of traffic that leaves on a serial DS3 card:

Class-map: class-symposium-unicast-acl (match-all)
  1530 packets, 2153296 bytes
  30 second offered rate 336000 bps, drop rate 247000 bps
  Match: access-group name acl-symposium-ucast
  Queueing
  queue limit 64 packets
  (queue depth/total drops/no-buffer drops) 0/1056/0
  (pkts output/bytes output) 474/672266
  bandwidth remaining 10% (4421 kbps)
    Exp-weight-constant: 9 (1/512)
    Mean queue depth: 50 packets
    dscp     Transmitted   Random drop   Tail drop      Minimum   Maximum   Mark
              pkts/bytes   pkts/bytes    pkts/bytes     thresh    thresh    prob

    af31       474/672266    22/31547    1034/1449483     32        40      1/10

This is an abbreviated version of our configuration:

class-map match-all class-symposium-unicast-acl
 match access-group name acl-symposium-ucast
!
policy-map WAN
 class class-symposium-unicast-acl
  bandwidth remaining percent 10
   random-detect dscp-based
 class class-default
!
interface Serial2/0
 description PA-T3/E3-EC - 45 Mbps DS3
 service-policy output WAN
!
ip access-list extended acl-symposium-ucast
 permit udp host x.x.x.x any

Adjusting the hold-queue on the serial interface, or queue-limit within the policy-map have no apparent effect, and repeatedly checking show int or show policy-map does not show the output queue filling up. Bandwidth should not be an issue, because the interface is very lightly loaded (under 2 Mbps output on a 45 Mbps link), and the offered rate on the class-map is well below the bandwidth provisioned. Removing the service-policy from the serial interface causes the drops to stop, but we need to be able to reserve bandwidth for certain applications, so this is not an acceptable long-term solution.

I suspect the nature of the affected traffic has something to do with the drops we're seeing, because it's unlike anything else our network. UDP packets are emitted from a server at regular (7-second) intervals to one or more remote PCs that request it. Some of the packets are in excess of 15KB, so they have to be fragmented.

Could the excessive fragmentation be a contributing factor, and if so, would ip virtual-reassembly on the ingress interface be of any benefit? Any other thoughts or suggestions?

EDIT: Output of "show int ser 2/0" as requested:

Serial2/0 is up, line protocol is up
  Hardware is PA-T3/E3-EC
  Description: xxxxx
  Internet address is x.x.x.x/30
  MTU 4470 bytes, BW 44210 Kbit/sec, DLY 200 usec,
     reliability 255/255, txload 8/255, rxload 235/255
  Encapsulation PPP, LCP Open
  Open: IPCP, CDPCP, crc 16, loopback not set
  Keepalive set (10 sec)
  Restart-Delay is 0 secs
  Last input 00:00:22, output 00:00:00, output hang never
  Last clearing of "show interface" counters 1d00h
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 14291
  Queueing strategy: Class-based queueing
  Output queue: 0/1000/0 (size/max total/drops)
  30 second input rate 40774000 bits/sec, 3435 packets/sec
  30 second output rate 1442000 bits/sec, 1847 packets/sec
     274986144 packets input, 4019985342 bytes, 0 no buffer
     Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
              0 parity
     6 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 6 abort
     149374995 packets output, 1855187875 bytes, 0 underruns
     0 output errors, 0 applique, 3 interface resets
     0 unknown protocol drops
     0 output buffer failures, 0 output buffers swapped out
     0 carrier transitions no alarm present
  DSU mode 0, bandwidth 44210 Kbit, scramble 0, VC 0

I realize the input rate is close to maximum in this copy/paste, but the problem occurs even when input is next to zero.

Best Answer

In your show policy-map output, notice how you have Mean queue depth: 50 packets and the Maximum Thresh is 40. As your average queue depth is above the threshold you will begin tail dropping (reference).

Since this UDP traffic is the only thing in the class I don't think you gain anything by implementing WRED on the class. I would recommend removing it (maybe consider using it under class-default instead).

If you really want to use WRED on the class then consider tuning the random-detect exponential weighting-constant to a higher value. This will help WRED from too quickly reacting and starting to drop packets. Given your description of the traffic pattern this could potentially solve the issue as well. (Reference)