R – How to remove responses from LiveHTTPHeaders output using awk, perl or sed

awkfirefoxlinuxperl

Let's say I have something like this (this is only an example, actual request will be different: I loaded StackOverflow with LiveHTTPHeaders enabled to have some samples to work on):

http://stackoverflow.com/

GET / HTTP/1.1
Host: stackoverflow.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070220 Firefox/2.0.0.2
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Expires: Sat, 28 Nov 2009 16:04:24 GMT
Vary: Accept-Encoding
Server: Microsoft-IIS/7.0
Date: Sat, 28 Nov 2009 16:04:23 GMT
Content-Length: 19015
----------------------------------------------------------
...

Full log of requests and responses is available on pastebin

And I want to remove all responses (HTTP/1.x 200 OK and everything in that response, for example) and all one liners showing page address. I would like to only have all requests left in text file with saved LiveHTTPHeaders output.

So, the output would be:

GET / HTTP/1.1
Host: stackoverflow.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070220 Firefox/2.0.0.2
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

GET /so/all.css?v=5290 HTTP/1.1
Host: sstatic.net
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070220 Firefox/2.0.0.2
Accept: text/css,*/*;q=0.1
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://stackoverflow.com/

...

Again, the full text of what I want to keep is available on pastebin.

If I save LiveHTTPHeaders captured session to text file and I would like to get result like from second 'code' in this question, how do I do this? Maybe with awk, sed or perl? Or something else? I'm on Linux.


Edit:
I'm trying to run Sinan's script. Script is this:

#!/usr/bin/perl
local $/ = "\n\n";
while (<>) {
    print if /^GET|POST/; # Add more request types as needed
}

I tried running it this way:

./cleanup-headers.pl livehttp.txt > filtered.txt

And this way:

perl cleanup-headers.pl < livehttp.txt > filtered.txt

… file filtered.txt was created but it's totally empty.

Anyone tried it on FULL headers i pasted into pastebin? Did it worked?

Full headers

Best Answer

Looks like you're having trailing whitespace issues.

$ sed -e 's/^\s*$//' livehttp.txt | \
  perl -e '$/ = ""; while (<>) { print if /^(GET|POST)/ }'

This works by putting Perl's readline operator into paragraph mode (via $/ = ""), which grabs records a chunk at a time, separated by two or more consecutive newlines.

It's nice when it works, but it's a bit brittle. Blank but not empty lines will gum up the works, but sed can clean those up.

Equivalent and more concise command:

$ sed -e 's/^\s*$//' livehttp.txt | perl -000 -ne 'print if /^(GET|POST)/'