Answering my own question...
The short answer is that, with TCP alone, the client has no way of knowing whether the intended recipient has actually received the bytes sent.
ie: It doesn't matter whether the client "happily" sent the bytes... even with TCP they may never arrive and you definitely have no knowledge as to when they will get to the intended recipient. Not without building in some acknowledgement into the application layer, anyway.
For my particular case, it turns out that the bytes the client sent DID actually arrive at the server, but took ~ 30s (!!!) to arrive, by which time both client and server application protocol code had timed out.
Views of the client and server side logs (for one failed connection) are here:
Those images are wireshark views of one particular TCP stream from the tcpdump capture files. You can see that there were a whole lot of re-transmissions occurring. What was the root cause driving the need for these re-transmissions? I have absolutely no idea (but would love to know!).
The data arrived at the server in the 2nd last entry (#974), ~30s after it was sent, and there were a large number of re-transmission attempts in between. If curious about server-side #793, this is an attempt by my application-layer protocol to send a message back to the client saying "timed out waiting for more data... where is it?".
In addition to inherent delays, one of the reasons the data was not appearing in the tcpdump
logs at the server also seems to be my usage of tcpdump
. In short: make sure to Ctrl-C out of the tcpdump
capture before looking at the capture file (that created with the -w
switch), as it seems to make a big difference as to what you see in the file. I expect this is a flush/sync'ing issue, but am guessing. However, without Ctrl-C I was definitely missing data.
More detail for future reference...
Although you often read/hear that the TCP will:
- Guarantee that your packets will arrive (vs UDP, which doesn't)
- Guarantee that your packets will arrive in order
it is apparent/obvious that the first is not actually true at all. TCP will do it's best to get your bytes to the intended recipient (including retrying for a LONG time), but this is not a guarantee, whether or not the send man page indicates for the send
return value that "On success, these calls return the number of characters sent". The latter is not true and is highly misleading (see below).
The root of this comes mostly from the way that the various socket calls (send
in particular) behave and how they interacts with the TCP/IP stack of the operating system...
On the sending side of a TCP exchange, the progression is quite simple. First you connect()
and then you send()
.
connect()
returning successfully definitely means that you were able to establish a connection to the server, so you at least know that at this time the server was there and listening (ie: the 3-part TCP opening handshake was successful).
For 'send`, although the documentation for the call indicates that the return value (if positive) is the "number of [bytes] sent", this is just plain wrong. All that the return value tells you is the number of bytes that the TCP stack in your underlying OS accepted into its outgoing buffer. After this point, the OS will try its best to deliver those bytes to the recipient that you initially made a connection with. But this may never happen, so it does not mean you can count on those bytes being sent! Somewhat surprisingly, there is also no real way to even determine whether this did (or did not!) happen, at least at the TCP socket layer, even though TCP has built in ACK messages. To verify full receipt of your sent bytes, you need to add some sort of acknowledgement at the application layer. nos has a great answer in another question that talks a bit about this.
Addendum...
One interesting dilemma I'm left with here is whether or not I need to build in some retry capability into my application-layer protocol. Currently it seems like, in the event of a timeout waiting for data at the server, it would be beneficial to close the connection and open a new one with the same request. It seems this way because the low level TCP retries were not successful, but in the mean time there were other client-side threads that were getting through in good time. This feels horribly wrong, though... you would think that the TCP retries should be sufficient. But they weren't. I need to look into the root cause of the TCP issues to resolve this.
Best Answer
Try something like...
That way the final recv will only ask for the amount needed, otherwise it'll ask for the TCP_BlockSize.