I have a TCP server built with sockets in Python. The application I'm building is time-sensitive, so the integrity of the data is important, therefore we need TCP. The bandwidth is very low.
And there's a client which requests data from the server every 50 ms. The client gets as response an OK message in case the server doesn't have the data or the actual required data.
Whenever the client makes a request to the server, it sends a frame of 5 bytes (not including the 40 extra bytes that come from IP and TCP).
On the other side, the server either responds with a frame of 5 bytes (in most cases) or a frame of > 70 bytes (generally every second)
On both sides the sockets are set like this:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) # this line is excluded in client's case
sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 8192)
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
sock.settimeout(0.5)
Everything runs fine on the local network (no lag at all), but whenever I connect to the server from the public IP (I'm port-forwarding) it lags a lot. The lag can go up to 15 seconds (at that moment it times out), which is incredibly much. Most of the time the RTT stays at 200-210 ms. On WireShark I can see that there are lots of (spurious) retransmissions and dup ACK.
What can I do? I've already disabled the Nagle's algorithm, but with no success yet.
Best Answer
I've had a good look over the capture files provided and here is my analysis. In summary, I believe this is an issue with your Router, which appears to be a Technicolor device of some sort.
Client Side Capture
Looking specifically at the traffic from the client to the server the vast majority of the sessions start reasonably OK but then encounter packet loss resulting in retransmissions from the client and timeouts. For example, examine packet number 161 - 207. At packet 161 the client sends a data packet to the server but gets no response back, causing the client to retransmit for around 15 seconds before the connection is torn down.
The majority of the TCP streams demonstrate this behaviour so it we can conclude that either the data packets from the client are not reaching the server OR the response from the server is not reaching the client.
Looking at the latency, there is a significant (and volatile) delay between the SYN and SYN/ACK response from the server, ranging from 168ms to 770ms.
Server Side Capture
If you apply a wireshark display filter for
tcp.stream eq 1 || tcp.stream eq 2
you can see both sides of the communication. Specifically, Client > Firewall and then Firewall > Server (and vice-versa). Again, everything starts OK and then around packet 407 things get interesting.Packet #407 marks the point when the client sends a chunk of new data to the server. The router receives this and forwards it to the server. The server sends an Acknowledgement packet back (packet #410) as well as another small data packet (#411). What we don't see however is the router passing these packets back to the client - this is the best evidence I have found of this being a router issue.
Compare this to one of the many successful exchanges slightly further up in the trace - packet 394 to 406 for example:
When things fail, everything stops after stage 4 - the two packets sent from the server appear to be dropped at the router.
Final Thoughts