R – n algorithm for fingerprinting the TCP congestion control algorithm used in a captured session

congestion-controlnetwork-programmingnetwork-protocolstcp

I would like a program for determining the TCP congestion control algorithm used in a captured TCP session.

The referenced Wikipedia article states:

TCP New Reno is the most commonly
implemented algorithm, SACK support is
very common and is an extension to
Reno/New Reno. Most others are
competing proposals which still need
evaluation. Starting with 2.6.8 the
Linux kernel switched the default
implementation from reno to BIC. The
default implementation was again
changed to CUBIC in the 2.6.19
version.

Also:

Compound TCP is a Microsoft
implementation of TCP which maintains
two different congestion windows
simultaneously, with the goal of
achieving good performance on LFNs
while not impairing fairness. It has
been widely deployed with Microsoft
Windows Vista and Windows Server 2008
and has been ported to older Microsoft
Windows versions as well as Linux.

What would be some strategies for determining which CC algorithm is in use (from a third party capturing the session)?

Update

This project has built a tool to do this:

The Internet has recently been
evolving from homogeneous congestion
control to heterogeneous congestion
control. Several years ago, Internet
traffic was mainly controlled by the
standard TCP AIMD algorithm, whereas
Internet traffic is now controlled by
many different TCP congestion control
algorithms, such as AIMD, BIC, CUBIC,
CTCP, HSTCP, HTCP, HYBLA, ILLINOIS,
LP, STCP, VEGAS, VENO, WESTWOOD+, and
YEAH. However, there is very little
work on the performance and stability
study of the Internet with
heterogeneous congestion control. One
fundamental reason is the lack of the
deployment information of different
TCP algorithms. The goals of this
project are to:

1) develop tools for identifying the TCP algorithms in the Internet,
2) conduct large-scale TCP-algorithm measurements in the Internet.

Best Answer

There are many more congestion control algorithms than you mention here, off the top of my head the list includes: FAST, Scalable, HSTCP, HTCP, Bic, Cubic, Veno, Vegas.

There are also small variations of them due to bug fixes in actual implementations and I'd guess that implementations in different OSes also behave slightly different from one another.

But if I need to try to come up with an idea it would be to estimate the RTT of the connection, you can try to look at the time it took between the third and the fourth packets, as the first and second packets may be tainted by ARPs and other discovery algorithms along the route.

After you have an estimate for RTT you could try to refine it along the way, I'm not exactly sure how you could do that though. But you don't require a full spec for the program, just ideas :-)

With the RTT figured out you can try to put the packets into RTT bins and count the number of in flight data packets in each bin. This way you'll be able to "plot" estimated-cwnd (# of packets in bin) to time and try some pattern matching there.

An alternative would be to go along the trace and try to "run" in your head the different congestion control algorithms and see if the decision at any point matches with the decision you would have done. It will require some leniency and accuracy intervals.

This definitely sounds like an interesting and challenging task!

Related Topic