First of all, there is a big misconception evident from your question: that the TCP window size is what is controlled by SO_SNDBUF
and SO_RCVBUF
. This is not true.
What is the TCP window size?
In a nutshell, the TCP window size determines how much follow-up data (packets) your network stack is willing to put on the wire before receiving acknowledgement for the earliest packet that has not been acknowledged yet.
The TCP stack has to live with and account for the fact that once a packet has been determined to be lost or mangled during transmission, every packet sent, from that one onwards, has to be re-sent since packets may only be acknowledged in order by the receiver. Therefore, allowing too many unacknowledged packets to exist at the same time consumes the connection's bandwidth speculatively: there is no guarantee that the bandwidth used will actually produce anything useful.
On the other hand, not allowing multiple unacknowledged packets at the same time would simply kill the bandwidth of connections that have a high bandwidth-delay product. Therefore, the TCP stack has to strike a balance between using up bandwidth for no benefit and not driving the pipe aggressively enough (and thus allowing some of its capacity to go unused).
The TCP window size determines where this balance is struck.
What do SO_SNDBUF
and SO_RCVBUF
do?
They control the amount of buffer space that the network stack has reserved for servicing your socket. These buffers serve to accumulate outgoing data that the stack has not yet been able to put on the wire and data that has been received from the wire but not yet read by your application respectively.
If one of these buffers is full you won't be able to send or receive more data until some space is freed. Note that these buffers only affect how the network stack handles data on the "near" side of the network interface (before they have been sent or after they have arrived), while the TCP window affects how the stack manages data on the "far" side of the interface (i.e. on the wire).
Answers to your questions
No. If that were the case then you would incur a roundtrip delay for each packet sent, which would totally destroy the bandwidth of connections with high latency.
Yes, but that has nothing to do with either the TCP window size or with the size of the buffers allocated to that socket.
According to all sources I have been able to find (example), scaling allows the window to reach a maximum size of 1GB.
Best Answer
The idea of the Nagle algorithm was to prevent more than one undersized packet from being in transit at a time. The idea of delayed ACKs (which came from Berkeley) was to avoid sending a lone ACK for each character received when typing over a Telnet connection with remote echo, by waiting a fixed period for traffic in the reverse direction upon which the ACK could be piggybacked.
The interaction of the two algorithms is awful. If you do big send, big send, big send, that works fine. If you do send, get reply, send, get reply, that works fine. If you do small send, small send, get reply, there will be a brief stall. This is because the second small send is delayed by the Nagle algorithm until an ACK comes back, and the delayed ACK algorithm adds 0.5 second or so before that happens.
A delayed ACK is a bet. The TCP implementation is betting that data will be sent shortly and will make it unnecessary to send a lone ACK. Every time a delayed ACK is actually sent, that bet was lost. The TCP spec allows an implementation to lose that bet every time without turning off delayed ACKs. Properly, delayed ACKs should only turn on when a few unnecessary ACKs that could have been piggybacked have been sent in a row, and any time a delayed ACK is actually sent, delayed ACKs should be turned off again. There should have been a counter for this.
Unfortunately, delayed ACKs went in after I got out of networking in 1986, and this was never fixed. Now it's too late.
John Nagle