Ntop: Which HTTP downloads (URLs) caused the traffic

httpmonitoringnetwork-monitoringntop

My server causes too much traffic, so I have installed ntop to monitor it.

On the Summary -> Traffic page in the Global TCP/UDP Protocol Distribution table I can see the traffic is periodically caused by HTTP.

On the All Protocols -> Traffic page in the first row there is the traffic (94,4%). But the first column (Host) shows my own server. Why is this?

When clicking there, I can see that the traffic in the Host Traffic Stats table. It is all in the Tot. Traffic Rcvd column. Therefore I think, one of my applications ist periodically downloading something big, or a lot.

But how to find out, what was downloaded? What are the downloaded URLs or at least the hosts that caused the most traffic?

Best Answer

Fix the systematic Issue:
Having your application logs that make requests be unknown and all over the place is problem. This is going to bite you in the ass again and again, so I would set aside some time to address this problem. Find some way to index or aggregate them. This is larger problem project that you should raise.

The Problem at Hand:
For the problem at hand, I would recommend wireshark / tcpdump. Once you have a traffic capture, you can use all sorts of techniques to try to find it. In wireshark you could use "statistics / conversations", sort by bytes, and then drill down into the captures from there. Riverbed's non-free Cascade Pilot does have "Web Bandwidth by Object" view for captures that would be good at this -- you could request a trial.

If you are not familiar with wireshark, now is a good time learn. It is a tool most sysadmins use on a regular basis.

If you know the server taking the bandwidth, and it is a Linux server, you might try Nethogs (nethogs) to identify the process using the bandwidth.

Related Topic