We sometimes have 90%+ more packet loss on our server, but it does not always appends. Right now it works perfectly, but just half an hour ago, it had just that problem.
Our service provider is telling us to go in a recovery system to test if this is really a hardware problem and not software on our side. However, I don't see anything that can cause packet loss on our side, especially if it is not consistent.
Is there anything we could check before doing an other test on the recovery system?
We have a dedicated server at Hetzner.de. It is connected to 100MBit ethernet. We did not try to change anything on the hardware side, because our server provider want that we check our software before to continue to check the hardware.
Here is the mtr reports I have made. During that the report, we had 3 burst of packet loss and the rest of the time the server was reachable :
Client to server
HOST: mbp Loss% Snt Last Avg Best Wrst StDev
1.|-- 10.0.1.1 0.0% 1000 0.4 0.2 0.2 3.4 0.2
2.|-- 10.0.1.1 0.3% 1000 27.5 29.7 5.9 237.3 34.6
3.|-- 10.170.172.121 0.4% 1000 17.2 41.9 7.2 334.1 44.2
4.|-- 216.113.123.158 1.4% 1000 44.4 58.6 10.6 299.6 49.2
5.|-- 216.113.123.194 1.1% 1000 36.6 72.9 19.4 330.7 48.1
6.|-- paix-nyc.init7.net 0.7% 1000 57.1 75.8 18.4 313.8 49.1
7.|-- r1lon1.core.init7.net 1.4% 1000 199.8 150.9 87.1 373.7 56.4
8.|-- r1fra1.core.init7.net 0.6% 1000 244.2 150.1 98.6 438.6 53.6
9.|-- gw-hetzner.init7.net 1.4% 1000 175.3 140.6 100.5 397.2 49.7
10.|-- hos-bb2.juniper2.rz16.het 39.0% 1000 120.0 136.7 103.5 362.6 44.3
11.|-- hos-tr4.ex3k13.rz16.hetzn 0.8% 1000 145.4 132.2 106.8 393.3 36.9
12.|-- static.98.43.9.5.clients. 39.8% 1000 116.0 131.5 106.1 371.8 34.4
Server to client
HOST: thetransitapp Loss% Snt Last Avg Best Wrst StDev
1. static.97.43.9.5.clients.you 29.0% 1000 7.2 7.4 0.9 24.9 1.9
2. hos-tr1.juniper1.rz16.hetzne 38.7% 1000 6.1 9.6 0.2 78.8 7.6
3. hos-bb2.juniper4.ffm.hetzner 36.2% 1000 11.8 11.4 5.8 29.0 1.5
4. r1fra1.core.init7.net 38.1% 1000 12.4 13.9 5.5 22.9 3.9
5. r1lon1.core.init7.net 36.3% 1000 23.5 26.5 17.6 37.6 4.4
6. r1nyc1.core.init7.net 35.5% 1000 92.3 93.8 86.1 103.0 3.7
7. paix-ny.ia-unyc-bb05.vtl.net 35.5% 1000 95.5 96.4 87.6 134.7 5.3
8. 216.113.123.169 36.3% 1000 101.5 102.0 94.4 124.9 3.6
9. 216.113.124.42 34.7% 1000 113.1 107.7 96.7 117.6 3.6
10. 216.113.123.157 37.5% 999 106.5 107.4 101.5 115.0 1.5
11. ??? 100.0 999 0.0 0.0 0.0 0.0 0.0
12. modemcable004.103-176-173.mc 36.7% 999 111.2 147.9 107.2 342.0 48.3
Here is the ethernet configuration
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: MII
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000033 (51)
Link detected: yes
ifconfig of eth0:
eth0 Link encap:Ethernet HWaddr c8:60:00:bd:2f:9d
inet addr:5.9.43.98 Bcast:5.9.43.127 Mask:255.255.255.224
inet6 addr: fe80::ca60:ff:febd:2f9d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3521 errors:0 dropped:0 overruns:0 frame:0
TX packets:2117 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2882770 (2.7 MiB) TX bytes:910907 (889.5 KiB)
Interrupt:30 Base address:0x8000
Best Answer
In my opinion it's hetzner fault. I've been arguing with them for a very long time about similar case.
We had those problems and were reporting it to the hosting company. The answer was always the same - "Please attach mtr in both directions" - they would answer like that even during the fault. So we did write a daemon that will launch mtr each time we have any packet loss between servers :
Then with this information they answered :
What is exactly happening ? I dont' know but it looks almost the same :
In my opinion it's their infrastructure problems. Notice that loss is occuring on the nodes : hos-tr1.ex3k3.rz1.hetzner.de, hos-tr4.juniper2.rz13.hetzner.de and so on.
If they don't fix that I'll probably migrate to linode or amazon.