How to export more than 1000 http request bodies from a large pcap file

pcaptshark

I have a pcap file (~2.3G) containing HTTP requests. I need to extract the body of each request in some way that I can further process it. Each request in its own file would work well, but I can be flexible on that.

I found something promising in tshark, as this command does almost what I need:

tshark -r capture.pcap --export-objects "http,data"

I get a folder with a bunch of files in it, each one containing one request body.

However, it only outputs the first 1000 requests. How can I get the rest of the requests?

Best Answer

Try running tshark -r events.pcap -Y "http.request" -T fields -e http.file_data.

-Y "http.request" - filters for packets which are http requests

-T fields -e http.file_data - sets the output fields to just the request body

EDIT: With a large file, you may need to split up your captures with a tool like editcap.

Related Solutions

PCAP Files – How to Read PCAP Files in a Friendly Format

Wireshark is probably the best, but if you want/need to look at the payload without loading up a GUI you can use the -X or -A options

tcpdump -qns 0 -X -r serverfault_request.pcap

14:28:33.800865 IP 10.2.4.243.41997 > 69.59.196.212.80: tcp 1097
        0x0000:  4500 047d b9c4 4000 4006 63b2 0a02 04f3  E..}..@.@.c.....
        0x0010:  453b c4d4 a40d 0050 f0d4 4747 f847 3ad5  E;.....P..GG.G:.
        0x0020:  8018 f8e0 1d74 0000 0101 080a 0425 4e6d  .....t.......%Nm
        0x0030:  0382 68a1 4745 5420 2f71 7565 7374 696f  ..h.GET./questio
        0x0040:  6e73 2048 5454 502f 312e 310d 0a48 6f73  ns.HTTP/1.1..Hos
        0x0050:  743a 2073 6572 7665 7266 6175 6c74 2e63  t:.serverfault.c
        0x0060:  6f6d 0d0a 5573 6572 2d41 6765 6e74 3a20  om..User-Agent:.
        0x0070:  4d6f 7a69 6c6c 612f 352e 3020 2858 3131  Mozilla/5.0.(X11
        0x0080:  3b20 553b 204c 696e 7578 2069 3638 363b  ;.U;.Linux.i686;

tcpdump -qns 0 -A -r serverfault_request.pcap

14:29:33.256929 IP 10.2.4.243.41997 > 69.59.196.212.80: tcp 1097
E..}..@.@.c.
...E;...^M.P..^w.G.......t.....
.%.}..l.GET /questions HTTP/1.1
Host: serverfault.com

There are many other tools for reading and getting stats, extracting payloads and so on. A quick look on the number of things that depend on libpcap in the debian package repository gives a list of 50+ tools that can be used to slice, dice, view, and manipulate captures in various ways.

For example.

HTTP dissector that reads from pcap

I would suspect that tcpflow would do your job well enough, which can take a pcap file and divvy it up into it's component parts. For instance, I just did the following as a test:

sudo tcpdump -i eth0 -n -s 0 -w /tmp/capt -v port 80

Then reloaded your question, stopped tcpdump, and then ran:

tcpflow -r /tmp/capt

And got about 20 files, each containing a single HTTP request or response (as appropriate).

On the other hand, I usually just go the soft option and open up my capture files in wireshark, whose "Analyze -> Follow TCP Stream" mode is freaking awesome (colour coded and everything).

Both of these tools, by the way, can do the packet capture themselves, too -- you don't have to feed them an existing packet capture via tcpdump.

If you have a specific need to parse the HTTP traffic after you've split it up, it's quite trivial: the HTTP protocol is very simple. In the trivial (non-keepalive/pipelined) case, you can use the following to get the request or response header:

sed '/^\r$/q' <connectionfile>

And this to get the body of the request/response:

sed -n '/^\r$/,$p' <connectionfile>

(You can also pipe things through those sed commands if you like).

On keepalive connections, you then need to start getting a little scripty, but even then it's about 20 lines of script to process the two files (A to B, B to A), extract the headers, read the Content-Length, then read the body -- and if you're doing any sort of automated processing, you'll be writing code to do that stuff anyway, so a bit of HTTP dissection doesn't add considerably to the workload.

Best Answer

Related Solutions

PCAP Files – How to Read PCAP Files in a Friendly Format

HTTP dissector that reads from pcap

Related Topic