TCP piggybacking introduced by a man-in-the-middle network node beween the client and server


When wireshark trace taken on the client side it appears to be:


But when wireshark trace taken for the same traffic on the server side it appears to be:


Where the HTTPGET packet appears to be the ACK for the SYN-ACK since it has the same sequence and ack number as the previous ACK packet.

Same thing happens later on the opposite way:

When wireshark trace taken on the server side it appears to be:


But wireshark trace taken for the same traffic on the client side it appears to be:


Where the 200OK appears to be the ACK for the HTTPGET since it has the same sequence and ack number as the previous ACK packet.

So my question is there any known network element that is doing TCP piggy-backing as a man-in-the-middle?

I believe this is causing some issue on both peers as they seems are waiting for the "missing" ACK forever:
For instance:

*Duplicate ACK for the 200OK at the server side since it is waiting for the third ACK in the TCP three way handshake (which has been piggybacked into the subsequent HTTPGET).

*Retransmission of HTTPGET at the client side since the original, non-piggybacked ACK for it never arrives.

Here are sample traces:

Best Answer

After looking at lots of changes happening to the packets in transit, I finally spotted one which could plausibly explain why the client is considering the data packet from the server to be out of sequence.

Look at the timestamp option, in particular look at the time stamp value field on packets from server to client. From the server these are originated with values going from 7236650 in the SYN-ACK to 7246570 in the last packet.

However on receipt on the client, the timestamp value in the SYN-ACK packet has been modified to the value 1068916716. The rest of the packets are send from server to client without modifications to the timestamp value.

So from the clients point of view, timestamp goes from 1068916716 to 7236708. In other words it is going backwards, and that is certainly a valid reason for the client to consider the packet to be out of sequence.

So there you have it my guess for the root cause to your problem is the timestamp in the SYN-ACK being mangled.

Before reaching that guess I observed a lot of other interesting facts from the packet traces. Though those other data points are not explaining your problem, they may still be relevant data points for further investigation. I used Wireshark to inspect the packet captures, but any tool which can decode all the TCP headers will do.

  • There is some NAT going on. The packets from the client originate with source IP and arrive with source IP Since it would be unusual to NAT between two RFC 1918 ranges, I am guessing there are two layers of NAT happening. I guess is mapped to a public IP close to the client and then to close to the server. (With the amount of modifications happening, it is slightly surprising that the client port number remains unchanged.)
  • Paying careful attention to the IPID field on packets from the client to the server, I observe this: They start out at 0x5b1f and simply increasing from that value until reaching 0x51d3 on the last packet. However on the receiving end the first packet has a different IPID on arriving. It is 0xd5f5. I also notice that the packet with ID 0x51c2 got delayed and arrived after the packet with ID 0x51c4.
  • The SYN packet which got the IPID field mangled also got the options reordered. This shouldn't cause any problem, but indicates that a middlebox has been pulling this SYN packet completely apart and produced a new one that resembled the original on most points. We can also see that for this particular packet, the TTL increased in transit.
  • Neither the mangled IPID, the dropped ACK packet or the delayed retransmission caused any problems. The server did reply to the request, and the delayed packet meant nothing because it was a retransmission of a packet, which the server had already received.
  • The IPID on packets from the server start with 0x0000 on the SYN-ACK packet, and then goes from 0x2766 on the second packet to 0x2770 on the last.
  • Only the packet with ID 0x2766 got lost. The rest of IDs arrive on the client in order.
  • The window scaling option got modified in flight. The server sent 9, the client received 7.
  • The lost packet is the ACK of the request itself. This packet getting lost explains why the client retransmits the request.
  • The time for the server to process the request and produce a reply is more than 200ms. Delaying an ACK for that long is not acceptable. It may be OK to delay it briefly just in case it can be piggybacked on a payload packet. But if no payload packet shows up within a few ms, the ACK is supposed to be transmitted by itself.
  • Upon receiving the packet with the first part of the HTTP reply (and ACK of the request) - the packet with ID 0x2767, the client responds with an ACK of the sequence number of the initial SYN-ACK rather than of the packet it just received. And it ignores the ACK it got and keeps retransmitting the request. In other words the client behaves as if it just received a packet which is out of sequence. If the client did this just because of the lost ACK from the server to the client, then that is clearly a bug in the TCP stack on the client. I suspect there is something else wrong with this packet. The advertised window size was increased on this packet while in transit from server to client. However since the window is nowhere near full, that is unlikely to be the problem.
  • Paying close attention to the absolute sequence numbers (and not just the relative values display by default in Wireshark), I notice that sequence numbers are passed unmodified from client to server, but are modified while in transit from server to client. As long as ACKed sequence numbers are modified accordingly while in transit from client to server, this should not cause any problem.

There is no way the involved middleboxes are going to adjust all of their changes back when found inside ICMP error messages. Thus I am confident that selectively reducing TTL on packets before they are send can be used to identify which router along the way is mangling the packets. (Except if the provider decided to drop all the relevant ICMP packets such that you won't be able to debug problems, and if that's the case it would be time to look for another provider.)