Tcp – Dropped data packet on satellite connection

packet-losstcp

We have several embedded pcs on farms in the UK and US. Among other connections these talk to our server sending a small packet of data (100 – 600 bytes) every 20 seconds.

Over DSL this is fine. Over satellite connections, we lose most of the packets.

We're using TCP, and tcpdump on the client shows the sequence:

-> syn                   (send)
<- syn ack               (receive)
-> ack
-> push ack
<- ack                   (spoofed?)
-> fin ack
<- ack                   (spoofed?)
<- fin ack
-> ack

The server, however, sees:

<- syn                   (receive)
-> syn ack               (send)
<- ack
<- fin ack
-> fin ack
<- ack

I think I'm correct in saying the extra acks the client receives are spoofed by the satellite end point in order to speed up the connection

We have ~100 DSL sites and 3 satellites. The DSL are all fine, and the satellites are all broken in the same way.

What's happening to the data? It gets through maybe one time in 20.

edit
I can ping the server from the client in question. The clients all have a reverse ssh tunnel to the server which is working fine. We can ssh in, and also download data. It's just this upload that has problems.

DSL connection – successful

root@mini2440:~ tcpdump port 1080 -vv
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
14:55:20.126968 IP (tos 0x0, ttl 64, id 29228, offset 0, flags [DF], proto TCP (6), length 60)
    client > server: Flags [S], cksum 0x1877 (incorrect -> 0x5ebd), seq 21640692, win 14600, options [mss 1460,sackOK,TS val 1485260760 ecr 0,nop,wscale 1], length 0
14:55:20.194124 IP (tos 0x0, ttl 51, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    server > client: Flags [S.], cksum 0xf10a (correct), seq 4087778233, ack 21640693, win 14480, options [mss 1452,sackOK,TS val 43969567 ecr 1485260760,nop,wscale 4], length 0
14:55:20.194465 IP (tos 0x0, ttl 64, id 29229, offset 0, flags [DF], proto TCP (6), length 52)
    client > server: Flags [.], cksum 0x186f (incorrect -> 0x3bcb), seq 1, ack 1, win 7300, options [nop,nop,TS val 1485260773 ecr 43969567], length 0
14:55:20.197225 IP (tos 0x0, ttl 64, id 29230, offset 0, flags [DF], proto TCP (6), length 403)
    client > server: Flags [P.], cksum 0x39c5 (correct), seq 1:352, ack 1, win 7300, options [nop,nop,TS val 1485260774 ecr 43969567], length 351
14:55:20.197564 IP (tos 0x0, ttl 64, id 29231, offset 0, flags [DF], proto TCP (6), length 52)
    client > server: Flags [F.], cksum 0x186f (incorrect -> 0x3a6a), seq 352, ack 1, win 7300, options [nop,nop,TS val 1485260774 ecr 43969567], length 0
14:55:20.267543 IP (tos 0x0, ttl 52, id 26507, offset 0, flags [DF], proto TCP (6), length 52)
    server > client: Flags [.], cksum 0x530f (correct), seq 1, ack 352, win 972, options [nop,nop,TS val 43969587 ecr 1485260774], length 0
14:55:20.271456 IP (tos 0x0, ttl 52, id 26508, offset 0, flags [DF], proto TCP (6), length 52)
    server > client: Flags [F.], cksum 0x530d (correct), seq 1, ack 353, win 972, options [nop,nop,TS val 43969587 ecr 1485260774], length 0
14:55:20.271771 IP (tos 0x0, ttl 64, id 29232, offset 0, flags [DF], proto TCP (6), length 52)
    client > server: Flags [.], cksum 0x186f (incorrect -> 0x3a46), seq 353, ack 2, win 7300, options [nop,nop,TS val 1485260789 ecr 43969587], length 0
8 packets captured

Satellite connection – unsuccessful

root@mini2440:~ tcpdump port 1080 -vv
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
15:14:50.027783 IP (tos 0x0, ttl 64, id 13618, offset 0, flags [DF], proto TCP (6), length 60)
    client > server: Flags [S], cksum 0x1884 (incorrect -> 0x1b8a), seq 2040495825, win 14600, options [mss 1460,sackOK,TS val 16534499 ecr 0,nop,wscale 1], length 0
15:14:50.029731 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    server > client: Flags [S.], cksum 0x3451 (correct), seq 51102354, ack 2040495826, win 5792, options [mss 1460,sackOK,TS val 67452903 ecr 16534499,nop,wscale 7], length 0
15:14:50.034910 IP (tos 0x0, ttl 64, id 13619, offset 0, flags [DF], proto TCP (6), length 52)
    client > server: Flags [.], cksum 0x187c (incorrect -> 0x5d38), seq 1, ack 1, win 7300, options [nop,nop,TS val 16534500 ecr 67452903], length 0
15:14:50.036082 IP (tos 0x0, ttl 64, id 13620, offset 0, flags [DF], proto TCP (6), length 173)
    client > server: Flags [P.], cksum 0x8d87 (correct), seq 1:122, ack 1, win 7300, options [nop,nop,TS val 16534500 ecr 67452903], length 121
15:14:50.036351 IP (tos 0x0, ttl 64, id 13621, offset 0, flags [DF], proto TCP (6), length 52)
    client > server: Flags [F.], cksum 0x187c (incorrect -> 0x5cbe), seq 122, ack 1, win 7300, options [nop,nop,TS val 16534500 ecr 67452903], length 0
15:14:50.037547 IP (tos 0x0, ttl 63, id 64893, offset 0, flags [DF], proto TCP (6), length 52)
    server > client: Flags [.], cksum 0x790d (correct), seq 1, ack 122, win 46, options [nop,nop,TS val 67452911 ecr 16534500], length 0
15:14:50.076479 IP (tos 0x0, ttl 63, id 64894, offset 0, flags [DF], proto TCP (6), length 52)
    server > client: Flags [.], cksum 0x78e4 (correct), seq 1, ack 123, win 46, options [nop,nop,TS val 67452951 ecr 16534500], length 0
15:14:51.076273 IP (tos 0x0, ttl 63, id 64895, offset 0, flags [DF], proto TCP (6), length 52)
    server > client: Flags [F.], cksum 0x74e4 (correct), seq 1, ack 123, win 46, options [nop,nop,TS val 67453974 ecr 16534500], length 0
15:14:51.076482 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    client > server: Flags [.], cksum 0x57be (correct), seq 123, ack 2, win 7300, options [nop,nop,TS val 16534708 ecr 67453974], length 0
9 packets captured

There was no ICMP traffic in either case.

Best Answer

The timestamps on your satellite pcap entries do imply tcp acknowledgement spoofing. Most devices that perform acknowledgement spoofing can be configured to bypass acceleration based on some combination of source/ destination IP address, source/ destination port; standard ACL concepts. This could be a feature that is configurable in the satellite modems (or nearby device) at your hub and spoke locaitons.

Wide area optimization or acceleration solutions are also common in such network architectures. Again, these solutions should provide a method to bypass your traffic that's having problems. Devices such as Riverbed Steelhead, Cisco WAAS, Bluecoat, and Citrix Cloudbridge/ Wanscaler are examples of technologies that might impact your application. A discussion with your provider (or network guy) should reveal if such technologies are in use on your network; if so, request bypassing your affected traffic in these devices to see if behavior changes. Best of luck.