Proving packets per second bottleneck

networkingpackets

I have a '100Mb' network connection that's currently consistently transmitting at about 20K packets/sec, irrespective of packet size in the range of 300-600 bytes. This yields an observed bandwidth of 25-98Mb. I'm constantly being told that because we've not hit the bandwidth limit, we don't have a line problem. I don't agree.

This connection is, on average, running at 60% of maximum the theoretical PPS rate for a 100Mb (copper ethernet) line, once packet size is accounted for. (Although the 100Mb bottle neck is fibre of unknown type, so may have different impact, I don't think that any fibre protocol is better than copper with interpacket gap).

My problem is – without access to the routers or fibre hardware (3rd party provided, can't be helped) how can I prove that we are packet limited? Ideally without causing a massive outage in the process 🙂

Best Answer

Collect the traffic with tcpdump or a similar tool and make a graph of the packet count per time unit. If your assumption is correct, you should see a clear ceiling for the packet count.

You may simulate a counter example by generating many large packets with something like ping -s 1472 -f, it may cause a small outage, so maybe do not do it during the traffic peaks. But 30 seconds may be acceptable for solving a larger problem - you decide.

A switch can be easily the bottleneck as well. Especially cheaper one or a black box router. This was the most common case for a WAN network I was working on. The minimum standard for this kind of traffic was a HP ProCurve line thing. Even an old Cisco was fine as well. But you have to test it.

Also good to mention that among ISPs we in generally used a rule of thumb that 60% utilized line was a fully saturated line. The reason why is that the saturation is basically average over some time of period. But on a shorter time frame you may have overloads by attempting to send just too many packets in the exact same moment which will lead to longer latency. Measure the latency as well. Wireshark is a good tool for a quick analysis like this.

Last but not least, I have not seen any kind of traffic which can fully saturate the line but the ping -s 1472 -f on an otherwise empty line. Once you have multiple connections, you have inefficiencies which lead to lower utilization. Basically, 100Mbit is a theoretical limit under ideal conditions. So the line provider may be right as well and upgrading the line may be the proper solution.

Related Topic