Interface Internal-Data0/0 "", is up, line protocol is up
2749335943 input errors, 0 CRC, 0 frame, 2749335943 overrun, 0 ignored, 0 abort
^^^^^^^^^^^^^^^^^^
0 output errors, 0 collisions, 0 interface resets
You show overruns on the InternalData interfaces, so you are dropping traffic through the ASA. With that many drops, it's not hard to imagine that this is contributing to problem. Overruns happen when the internal Rx FIFO queues overflow (normally because of some problem with load).
EDIT to respond to a question in the comments:
I don't understand why the firewall is overloaded, it is not close to using 10Gbps. Can you explain why we are seeing overruns even when the CPU and bandwidth are low? The CPU is about 5% and the bandwidth either direction never goes much higher than 1.4Gbps.
I have seen this happen over and over when a link is seeing traffic microbursts, which exceed either the bandwidth, connection-per-second, or packet-per-second horsepower of the device. So many people quote 1 or 5 minute statistics as if the traffic is relatively constant across that timeframe.
I would take a look at your firewall by running these commands every two or three seconds (run term pager 0
to avoid paging issues)...
show clock
show traffic detail | i ^[a-zA-Z]|overrun|packets dropped
show asp drop
Now graph out how much traffic you're seeing every few seconds vs drops; if you see massive spikes in policy drops or overruns when your traffic spikes, then you're closer to finding the culprit.
Don't forget that you can sniff directly on the ASA with this if you need help identifying what's killing the ASA... you have to be quick to catch this sometimes.
capture FOO circular-buffer buffer <buffer-size> interface <intf-name>
Netflow on your upstream switches could help as well.
One way to do this is ICMP Timestamp, which is milliseconds from midnight UTC. It has the added benefit that you don't necessarily need to control both ends, as long as the far-end is not firewalled, there is good chance it'll work.
However, to have reliable one-way measurements, you need reliably same time in both ends. As ICMP timestamp only have precision of 1ms (which is not nearly enough for many applications, but sufficient for this) it's reasonably easy to find even non-cooperating hosts where ICMP timestamp will provide useful data.
If you control both ends, be sure that you are synchronizing NTP to only 1 server and same server. The absolute clock is not very important, it's just important that you experience as closely same time as possible.
If ICMP timestamp is not sufficient, it's very easy to write 10 lines of ruby/perl/python or even C to do measurements when you control both ends.
I can't really suggest software for doing ICMP timestamp measurements unidirectionally, hping2 supports sending ICMP timestamp but for some reason does not output unidirectional values. I wrote patch for hping2 to display one way latencies.
Best Answer
It depends on what protocols your applications use. Any protocol can have packet loss, and some packet loss is on purpose, e.g. RED randomly drops packets from queues to prevent TCP global synchronization (a bad thing). Also, when using QoS, you may want to police certain protocols over a certain bandwidth, and this drops traffic in excess of a specified bandwidth. A lot of loss occurs due to congestion, usually due to bandwidth over-subscription. It can also occur due to network attacks.
Jitter is different than packet loss. Jitter is a variation in the delay of packet delivery. It is important to minimize jitter in real-time protocols. For instance, VoIP can withstand a fair amount of delay, but even when you have a low delay, variations in the delay can cause big problems.