Extracting TCP application data from pcap file

network-trafficnetworkingparsingpcaptcpdump

Given a .pcap (or similar) file, I'd like to select one TCP connection and dump both application data streams (the one from the other peer and the one two the other peer) into two separate filesĀ on disk.

Let's assume that I have a .pcap file that amongst other things I know contains a full TCP stream (from SYN to final FIN+ACK/RST) of a HTTP/1.1 plaintext connection. I would like to have two resulting files with the content. Ie. one file has

GET / HTTP/1.1\r\n
host: foobar.com\r\n
\r\n

and the other file has

HTTP/1.1 200 ok\r\n
content-length: ...\r\n
... \r\n
\r\n
<html>...</html>

And I want this to be precisely the application data traffic that would've been seen/sent in user space (from read/write/send/recv`/…). What I want to do with this is dump some traffic and use that to test my parsers for a certain some network protocol. The parser should just be able to read one of those files and attempt to parse the data stream.


How could such a command line tool look? I'm not sure if this is super helpful but I thought it might make it clearer what I'm looking for if I also gave a usage example of an imaginary tool which can do this.
Let's call the imaginary tool (this is what I'm looking for) tcp-stream-extract. I'd like to call it with something like

### imaginary usage example of the tool that I'd like to find :)

# dump from 12345 to 23456
tcp-stream-extract \
    -i my-captured-packets.pcap
    -s 127.0.0.1:12345        \ # source address 127.0.0.1:12345
    -d 127.0.0.1:23456        \ # destination address 127.0.0.1:23456
    -t '2021-01-28 09:12:00Z' \ # the TCP conn was alive at that time
    -w from-port-12345-to-port-23456

# dump from 23456 to 12345
tcp-stream-extract \
    -i my-captured-packets.pcap
    -s 127.0.0.1:23456        \ # source address 127.0.0.1:12345
    -d 127.0.0.1:12345        \ # destination address 127.0.0.1:23456
    -t '2021-01-28 09:12:00Z' \ # the TCP conn was at that time
    -w from-port-23456-to-port-12345

Best Answer

If you must do this manually, you need to remove the headers of the encapsulating protocols. However, there are some subtleties to them and it may not be trivial:

  • Ethernet (L2): The Ethernet header (14 bytes) may or may not include a 802.1q tag (in front of Ethertype, adding 4 bytes; 0x0800=IPv4, 0x86dd=IPv6, 0x8100=802.1q tag) and the Ethernet payload may or may not be trailed by the FCS (4 bytes).
  • IP (L3): A basic IPv4 header is 20 bytes, IPv6 uses 40 bytes. Each has options or extensions - check the IHL field for IPv4 (5 = no options), or the Next Header field for IPv6 (6 indicates TCP, no extension). Any packet may be fragmented across several L2 frames (IPv4: MF is set or Fragment Offset>0; IPv6 uses extension header 44). Each first fragment includes L4 headers, later fragments do not.
  • TCP (L4): The basic TCP segment header is 20 bytes, but it may also include options (Data Offset>5). TCP resorts out-of-order delivered segments (by Sequence Number), so you might need to buffer a considerable portion of the data. Segments may also arrive in duplicate.

I recommend using a proper tool like tcpflow, as @AlexD has suggested.