I have some pcap data from a local interface which I'd like to analyze. Specifically, I'd like the content of HTTP sessions. I'm aware of many HTTP header statistics tools, but I would specifically like to reassemble the content of each complete HTTP connection.
Is there any suitable layer-4 packet dumping tool (for Linux) in the same way that tcpdump et al work for layer 3, something that can understand and manipulate HTTP?
Feel free to redirect me if this has been asked before, though I haven't been able to find any answer to this yet on SF. Thanks!
Best Answer
I would suspect that
tcpflow
would do your job well enough, which can take a pcap file and divvy it up into it's component parts. For instance, I just did the following as a test:Then reloaded your question, stopped
tcpdump
, and then ran:And got about 20 files, each containing a single HTTP request or response (as appropriate).
On the other hand, I usually just go the soft option and open up my capture files in wireshark, whose "Analyze -> Follow TCP Stream" mode is freaking awesome (colour coded and everything).
Both of these tools, by the way, can do the packet capture themselves, too -- you don't have to feed them an existing packet capture via
tcpdump
.If you have a specific need to parse the HTTP traffic after you've split it up, it's quite trivial: the HTTP protocol is very simple. In the trivial (non-keepalive/pipelined) case, you can use the following to get the request or response header:
And this to get the body of the request/response:
(You can also pipe things through those sed commands if you like).
On keepalive connections, you then need to start getting a little scripty, but even then it's about 20 lines of script to process the two files (A to B, B to A), extract the headers, read the Content-Length, then read the body -- and if you're doing any sort of automated processing, you'll be writing code to do that stuff anyway, so a bit of HTTP dissection doesn't add considerably to the workload.