Cisco – Unexplained 0.1% packets time out when pinging from router

ciscopacketlosspingroutertimeout

I'm troubleshooting a customer who requires the ability to send 5000 pings from the router to their remote site over a satellite link with zero timeouts, yet they keep experiencing one to five packets lost per test.

Under ordinary circumstances, I'd be willing to chalk up such a low loss rate as the cost of a satellite link, but the drops only show up when pinging from the router to the remote site. To clarify, here's the involved network devices:

Outbound Traffic

  1. 192.1.1.51 Router Hub
  2. 192.1.1.52 TX Switch Hub
  3. 192.1.1.50 Encapsulator Hub
  4. 172.1.1.1 Remote Site Remote

Return Traffic

  1. 172.1.1.1 Remote Site Remote
  2. 192.1.1.28 Channel Unit Hub
  3. 192.1.1.53 RX Switch Hub
  4. 192.1.1.51 Router Hub

When pinging from the Router to the remote site, the losses show up. When pinging from a Sun server attached to the TX switch (bypassing the router), the 5000 pings complete without a single loss. This verifies the entire satellite path, and all equipment except for the router.

Then I tried sending 5000 pings from the router to all of the other devices aside from the remote site…and I got back all 5000 almost instantaneously with no drops, so the connection from the router to everything else in the path is verified good.

The router in question is a Cisco 7206VXR, and the cpu utilization doesn't appear to ever go above 50%. The highest process is only at 20%, so I'm not confident that it's simply a matter of the router dropping ICMP packets due to lower priority, particularly given the router will send 5000 packets to local devices with no issues.

I also looked into the possibility of a null route, but the only possible culprit is an essential route for remote access, according to the customer, and I can't post their running config here to get a second opinion.

Any suggestions would be greatly appreciated. I have very little networking experience, and I'm beating my head against the wall to reconcile these seemingly contradictory symptoms.

Best Answer

Datagrams are a best effort service. If you have a requirement that data be reliably delivered, you cannot use datagrams It really is that simple. The entire design of the system, end to end, is not meant to meet this requirements. You can't just impose it on the system as a whole at the end like putting a cherry on a sundae.