I've experienced significant dataloss with iPerf in UDP mode as a result of the CPU not being able to keep up. For some reason, iPerf with UDP seems to be much more CPU intensive than iPerf with TCP. Do you experience the same loss percentages when you set iPerf to half the rate?
To answer your second question about how much packet loss is acceptable, it really depends on what application you are running, how much traffic you've got. Really, there shouldn't be any loss if you are under your bandwidth limit. For most things, I probably wouldn't complain too much about .25%, but that is still a lot of loss if you are running at really high rates.
[EDIT 1] Some other thoughts that I've had on the topic:
- Try incrementing the rates of iPerf. If there is a systemic problem somewhere, it is likely that you'll experience the same percentage of loss no matter what the rate. If you are at the limits of your hardware, or your provider does some sort of RED, then there will likely be no loss up to a certain rate, and then incrementally worse loss the higher above that you go.
- Do your tcpdump measurement of the iPerf session, just to verify that your tests are accurate.
- Try iPerf with TCP. This won't report loss, but if you are getting loss then the connection won't be able to scale up very high. Since latency will also affect this, make sure to test to an endpoint with as little latency as possible.
- Depending on what gear you have on the inside of your connection, make sure you are as close it it as possible. E.g. if you have multiple switches between your test system and the edge router, move to a directly connected switch.
- If you have a managed switch, check the stats on it to make sure the loss isn't occurring there. I've encountered some cheaper switches that start dropping when you get close to 100Mbps of UDP traffic on them (mostly old and cheap unmanaged switches though).
- Try simultaneous iPerfs from two different clients to two different hosts, so that you can be sure the limit isn't a result of CPU or a cheap local NIC card.
we found the root cause of this. We had an acceptCount of 25 in our tomcat server.xml.
acceptCount is documented like this:
acceptCount
The maximum queue length for incoming
connection requests when all possible
request processing threads are in use.
Any requests received when the queue
is full will be refused. The default
value is 100.
But this is not the whole story about acceptCount. Short: acceptCount is the backlog Parameter when opening the socket. So this value is important for the listen backlog, even if not all threads are busy. It is important if request are faster coming in then tomcat can accept and delegate them to waiting threads. The default acceptCount is 100. This is still a small value to feed a sudden peak in requests.
We checked the same thing with apache and nginx and had the same strange packet loss but with higher concurrency values. The corresponding value in apache is ListenBacklog which defaults to 511.
BUT, with debian (and other linux based os) the default max value for the backlog paramter is 128.
$ sysctl -a | grep somaxc
net.core.somaxconn = 128
So whatever you type in acceptCount or ListenBacklog it will not be over 128 until you change net.core.somaxconn
For a very busy webserver 128 is not enough. You should change it to something like 500, 1000 or 3000, depending on your needs.
After setting acceptCount to 1000 and net.core.somaxconn to 1000 we no longer had those dropped packets. (Now we have a bottleneck somewhere else, but this is another story..)
Best Answer
Short answer: Yes.
Long answer: It Depends.
Zero is the only acceptable amount of packet loss.
Packet loss > 0 indicates a problem somewhere that needs to be investigated.
A little packet loss (<5%, occasionally) may make websites slow (from retransmission delays or lost DNS queries), but your average user probably won't notice.
Moderate packet loss (up to 10%, happening semi-regularly) will often be noticeable. The website will be "slow".
High packet loss (>10%, semi-regularly / constantly) will infuriate your users. The website will take a long time to load, or may not load at all. It will probably be so painfully slow that people stop visiting.
You are not experiencing "high" packet loss -- you're experiencing EXTREME packet loss (70+% of what your sending never gets where it's going -- If UPS worked that way you'd never ship anything with them again).
I would expect NOTHING to work with packet loss as extreme as what you're claiming -- you are effectively not connected to the internet.
My advice to you is to remedy the packet loss situation (i.e. "Find a new provider").
What you're describing is totally unacceptable.