The gist of this question is in the title: what could cause TCP to retransmit only the end of a (fully acknowledged) segment?
Here is a TCP conversation between two hosts: a SSH server (172.16.6.249, physical machine) and a SSH client, executing the command "ssh-keyscan" (192.168.0.18, virtual machine). This capture was done on the network interface of 192.168.0.18.
No. Time Source Destination Protocol Length Info
1 0.000000 192.168.0.18 172.16.6.249 TCP 74 46180 > 22 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=173592928 TSecr=0 WS=128
2 0.001274 172.16.6.249 192.168.0.18 TCP 74 22 > 46180 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=3139755418 TSecr=173592928 WS=128
3 0.001309 192.168.0.18 172.16.6.249 TCP 66 46180 > 22 [ACK] Seq=1 Ack=1 Win=28288 Len=0 TSval=173592929 TSecr=3139755418
4 0.010710 172.16.6.249 192.168.0.18 TCP 109 22 > 46180 [PSH, ACK] Seq=1 Ack=1 Win=29056 Len=43 TSval=3139755421 TSecr=173592929
5 0.010741 192.168.0.18 172.16.6.249 TCP 66 46180 > 22 [ACK] Seq=1 Ack=44 Win=28288 Len=0 TSval=173592931 TSecr=3139755421
6 0.010886 192.168.0.18 172.16.6.249 TCP 91 46180 > 22 [PSH, ACK] Seq=1 Ack=44 Win=28288 Len=25 TSval=173592931 TSecr=3139755421
7 0.010965 192.168.0.18 172.16.6.249 TCP 1464 46180 > 22 [ACK] Seq=26 Ack=44 Win=28288 Len=1398 TSval=173592931 TSecr=3139755421
8 0.011950 172.16.6.249 192.168.0.18 TCP 66 22 > 46180 [ACK] Seq=44 Ack=26 Win=29056 Len=0 TSval=3139755421 TSecr=173592931
9 0.011959 192.168.0.18 172.16.6.249 TCP 284 46180 > 22 [PSH, ACK] Seq=1424 Ack=44 Win=28288 Len=218 TSval=173592931 TSecr=3139755421
10 0.012227 172.16.6.249 192.168.0.18 TCP 66 22 > 46180 [ACK] Seq=44 Ack=1424 Win=31872 Len=0 TSval=3139755421 TSecr=173592931
11 0.033124 172.16.6.249 192.168.0.18 TCP 1714 22 > 46180 [PSH, ACK] Seq=44 Ack=1424 Win=31872 Len=1648 TSval=3139755421 TSecr=173592931
12 0.033153 192.168.0.18 172.16.6.249 TCP 66 46180 > 22 [ACK] Seq=1642 Ack=1692 Win=31616 Len=0 TSval=173592937 TSecr=3139755421
13 0.033173 172.16.6.249 192.168.0.18 TCP 316 [TCP Retransmission] 22 > 46180 [PSH, ACK] Seq=1442 Ack=1642 Win=34688 Len=250 TSval=3139755424 TSecr=173592931
14 0.033184 192.168.0.18 172.16.6.249 TCP 78 [TCP Dup ACK 12#1] 46180 > 22 [ACK] Seq=1642 Ack=1692 Win=31616 Len=0 TSval=173592937 TSecr=3139755424 SLE=1442 SRE=1692
15 0.035635 192.168.0.18 172.16.6.249 TCP 114 46180 > 22 [PSH, ACK] Seq=1642 Ack=1692 Win=31616 Len=48 TSval=173592937 TSecr=3139755424
16 0.047742 172.16.6.249 192.168.0.18 TCP 690 22 > 46180 [PSH, ACK] Seq=1692 Ack=1690 Win=34688 Len=624 TSval=3139755430 TSecr=173592937
17 0.047869 192.168.0.18 172.16.6.249 TCP 66 46180 > 22 [FIN, ACK] Seq=1690 Ack=2316 Win=34304 Len=0 TSval=173592940 TSecr=3139755430
18 0.049738 172.16.6.249 192.168.0.18 TCP 66 22 > 46180 [FIN, ACK] Seq=2316 Ack=1691 Win=34688 Len=0 TSval=3139755430 TSecr=173592940
I don't understand the frames 13 and 14. 172.16.6.249 already sent a segment with sequence number 44 + length 1648 = 1692 (frame 11), to which 192.168.0.18 answered only with an ACK 1692 (frame 12).
What could make 172.16.6.249 send again the end of the segment (frame 13)? I have checked the segment payload and it does match the last 250 bytes of the payload in frame 11.
I also assume the selective acknowledgement of these last 250 bytes in frame 14 is the result of the client being confused by receiving the same data twice, but maybe I am missing something here.
I unfortunately don't have the corresponding capture on 172.16.6.249, as I am investigating network issues that are hard to reproduce.
The hosts are physically very close, with only one (Linux) router between them and one physical switch, but there is some SDN going on (Linux bridges + VXLAN). This however should be transparent to the endpoints.
What should I be looking at next?
Best Answer
I think your capture is scooped due to some form of TCP segmenation offload on your host. I doubt that the packets appeared on the wire as shown. Notice that frame 11 has a reported length of 1714 octets, too long for most Ethernet LANs.