The main issue is big WAN delay. It will be very worse if it also having random packet lost.
1, the tcp_mem also need set large to allocate more memory. For example, set it as
net.ipv4.tcp_mem = 4643328 6191104 9286656
2, you can capture the packets through wireshark/tcpdump for about several minutes then analysize whether it has random packet lost. You can also upload the packets file if you like.
3, you can try to tune the other tcp parameters
Eg. set tcp_westwood=1 and tcp_bic=1
Have you tried enabling Compound TCP (CTCP) in your Windows 7/8 clients.
Please read:
Increasing Sender-Side Performance for High-BDP Transmission
http://technet.microsoft.com/en-us/magazine/2007.01.cableguy.aspx
...
These algorithms work well for small BDPs and smaller receive window
sizes. However, when you have a TCP connection with a large receive
window size and a large BDP, such as replicating data between two
servers located across a high-speed WAN link with a 100ms round-trip
time, these algorithms do not increase the send window fast enough to
fully utilize the bandwidth of the connection.
To better utilize the bandwidth of TCP connections in these
situations, the Next Generation TCP/IP stack includes Compound TCP
(CTCP). CTCP more aggressively increases the send window for
connections with large receive window sizes and BDPs. CTCP attempts to
maximize throughput on these types of connections by monitoring delay
variations and losses. In addition, CTCP ensures that its behavior
does not negatively impact other TCP connections.
...
CTCP is enabled by default in computers running Windows Server 2008 and disabled by
default in computers running Windows Vista. You can enable CTCP with the netsh
interface tcp set global congestionprovider=ctcp
command. You can disable CTCP with
the netsh interface tcp set global congestionprovider=none
command.
Edit 6/30/2014
to see if CTCP is really "on"
> netsh int tcp show global
i.e.
PO said:
If I understand this correctly, this setting increases the rate at
which the congestion window is enlarged rather than the maximum size
it can reach
CTCP aggressively increases the send window
http://technet.microsoft.com/en-us/library/bb878127.aspx
Compound TCP
The existing algorithms that prevent a sending TCP peer from
overwhelming the network are known as slow start and congestion
avoidance. These algorithms increase the amount of segments that the
sender can send, known as the send window, when initially sending data
on the connection and when recovering from a lost segment. Slow start
increases the send window by one full TCP segment for either each
acknowledgement segment received (for TCP in Windows XP and Windows
Server 2003) or for each segment acknowledged (for TCP in Windows
Vista and Windows Server 2008). Congestion avoidance increases the
send window by one full TCP segment for each full window of data that
is acknowledged.
These algorithms work well for LAN media speeds and smaller TCP window
sizes. However, when you have a TCP connection with a large receive
window size and a large bandwidth-delay product (high bandwidth and
high delay), such as replicating data between two servers located
across a high-speed WAN link with a 100 ms round trip time, these
algorithms do not increase the send window fast enough to fully
utilize the bandwidth of the connection. For example, on a 1 Gigabit
per second (Gbps) WAN link with a 100 ms round trip time (RTT), it can
take up to an hour for the send window to initially increase to the
large window size being advertised by the receiver and to recover when
there are lost segments.
To better utilize the bandwidth of TCP connections in these
situations, the Next Generation TCP/IP stack includes Compound TCP
(CTCP). CTCP more aggressively increases the send window for
connections with large receive window sizes and large bandwidth-delay
products. CTCP attempts to maximize throughput on these types of
connections by monitoring delay variations and losses. CTCP also
ensures that its behavior does not negatively impact other TCP
connections.
In testing performed internally at Microsoft, large file backup times
were reduced by almost half for a 1 Gbps connection with a 50ms RTT.
Connections with a larger bandwidth delay product can have even better
performance. CTCP and Receive Window Auto-Tuning work together for
increased link utilization and can result in substantial performance
gains for large bandwidth-delay product connections.
Best Answer
Try using ncat, I'm fairly certain it can do this. http://nmap.org/ncat/
Perhaps as a file transfer? http://nmap.org/ncat/guide/ncat-file-transfer.html
It has a --hex-dump option so you can see whats really going on.
You could also just compile a simple program to output exactly what you want and use netcat to call that and transfer its output.