Wireshark is probably the best, but if you want/need to look at the payload without loading up a GUI you can use the -X or -A options
tcpdump -qns 0 -X -r serverfault_request.pcap
14:28:33.800865 IP 10.2.4.243.41997 > 69.59.196.212.80: tcp 1097
0x0000: 4500 047d b9c4 4000 4006 63b2 0a02 04f3 E..}..@.@.c.....
0x0010: 453b c4d4 a40d 0050 f0d4 4747 f847 3ad5 E;.....P..GG.G:.
0x0020: 8018 f8e0 1d74 0000 0101 080a 0425 4e6d .....t.......%Nm
0x0030: 0382 68a1 4745 5420 2f71 7565 7374 696f ..h.GET./questio
0x0040: 6e73 2048 5454 502f 312e 310d 0a48 6f73 ns.HTTP/1.1..Hos
0x0050: 743a 2073 6572 7665 7266 6175 6c74 2e63 t:.serverfault.c
0x0060: 6f6d 0d0a 5573 6572 2d41 6765 6e74 3a20 om..User-Agent:.
0x0070: 4d6f 7a69 6c6c 612f 352e 3020 2858 3131 Mozilla/5.0.(X11
0x0080: 3b20 553b 204c 696e 7578 2069 3638 363b ;.U;.Linux.i686;
tcpdump -qns 0 -A -r serverfault_request.pcap
14:29:33.256929 IP 10.2.4.243.41997 > 69.59.196.212.80: tcp 1097
E..}..@.@.c.
...E;...^M.P..^w.G.......t.....
.%.}..l.GET /questions HTTP/1.1
Host: serverfault.com
There are many other tools for reading and getting stats, extracting payloads and so on. A quick look on the number of things that depend on libpcap in the debian package repository gives a list of 50+ tools that can be used to slice, dice, view, and manipulate captures in various ways.
For example.
I would suspect that tcpflow
would do your job well enough, which can take a pcap file and divvy it up into it's component parts. For instance, I just did the following as a test:
sudo tcpdump -i eth0 -n -s 0 -w /tmp/capt -v port 80
Then reloaded your question, stopped tcpdump
, and then ran:
tcpflow -r /tmp/capt
And got about 20 files, each containing a single HTTP request or response (as appropriate).
On the other hand, I usually just go the soft option and open up my capture files in wireshark, whose "Analyze -> Follow TCP Stream" mode is freaking awesome (colour coded and everything).
Both of these tools, by the way, can do the packet capture themselves, too -- you don't have to feed them an existing packet capture via tcpdump
.
If you have a specific need to parse the HTTP traffic after you've split it up, it's quite trivial: the HTTP protocol is very simple. In the trivial (non-keepalive/pipelined) case, you can use the following to get the request or response header:
sed '/^\r$/q' <connectionfile>
And this to get the body of the request/response:
sed -n '/^\r$/,$p' <connectionfile>
(You can also pipe things through those sed commands if you like).
On keepalive connections, you then need to start getting a little scripty, but even then it's about 20 lines of script to process the two files (A to B, B to A), extract the headers, read the Content-Length, then read the body -- and if you're doing any sort of automated processing, you'll be writing code to do that stuff anyway, so a bit of HTTP dissection doesn't add considerably to the workload.
Best Answer
Try running
tshark -r events.pcap -Y "http.request" -T fields -e http.file_data
.-Y "http.request"
- filters for packets which are http requests-T fields -e http.file_data
- sets the output fields to just the request bodyEDIT: With a large file, you may need to split up your captures with a tool like editcap.