Linux – How to safely capture or duplicate incoming requests to a web server

apache-2.2linuxlinux-networkingweb-applications

We have taken over a legacy web application that we cannot modify (the source code is broken and deployment fails) and we will eventually rewrite. Ideally we would migrate it one step at a time but this is not possible since we cannot effectively modify the application.

I am in charge of rewriting the application and there are some complex synchronization algorithms that I'd like to test against the data POST'd to the current API.

What's the easiest and safest way to capture incoming HTTP requests with all associated data? The solution must be transparent to API users. The server is running on Ubuntu Linux and we have SSH access to it. The web app is running on Apache 2 on Ruby.

  • Use some sort of packet sniffer to capture incoming traffic. I'd have to figure out a way to replay these requests against a server of my own.
  • Change the DNS to point to another IP that would capture and log the request data, then redirect the request to the production server. I don't think we have access to network infrastructure so I assume this must be executed as a web service that returns a request with a redirect header to the real server. Seems fragile and could pose security concerns for browsers?
  • Use some sort of Apache module to do this.

How do the above solutions compare in terms of:

  • The risk imposed on bringing the server and/or website down
  • The ease of implementation

Please feel free to suggest any further/better alternatives.

Best Answer

I would use tcpdump or ngrep, it would be nice to have the port on the switch connected to the web server mirrored, but in absence of that, you can run ngrep or tcpdump on the server itself.

You will need superuser access to run either of those programs.

You're going to want to read a bit, as you obviously know what you are looking for in the traffic, ngrep does allow you to select traffic by regex, which might allow you to pick out packets with better accuracy.

ngrep -l -q -d eth0 "^POST " tcp and port 80 -O dump.file

This would get you any HTTP data POSTed to port 80 on eth0. You might be able to pick out something much more specific. If you are going to be reading the traffic directly from the file, you might want to add -W byline as it makes the packets much more readable as it respects the line breaks, so you can see the packet written out more logically (for humans). -O dump.file will write the output of your packet capture to a file. The output can be as detailed as you'd like, to replay the packets, have a look at tcpreplay