Linux – Random high CPU usage on Linux using Apache

apache-2.2cpu-usagelinux

I've recently had a bit of a strange problem with my web server. Over the last day or so the site seems to be slowing down somewhat at random intervals, we don't seem to be experiencing any major extra traffic, however a quick look at 'top' and httpd seems to be jumping from between 3-10% to around 99%, then briefly hitting around mid 80's then going back down. For example:

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
2443 apache    25   0  256m  20m 5472 R 88.2  2.1   3:22.29 httpd

This seems to happen every 30 minutes or so. The strange thing is at the same time this is happening I can run the Apache server-status page and will get (for example):

CPU Usage: u700.5 s6.22 cu0 cs0 - 20.2% CPU load

So my question is two-fold:

  1. Does anyone know why this issue may have cropped up over the last day or so (there have been no changes made to the server)
  2. Why would my CPU usage stats in top be vastly higher than server-status and which is correct?

Best Answer

The CPU usage from Apache's server-status page is the average usage since Apache was started so it won't show spikes like this. When you get these load spikes you can check the server-status page to see what pages/clients are being server (ExtendedStatus must be on).

You can also use netstat to see what clients are currently accessing your machine:

 netstat -an | grep ESTABLISHED

If you run this over multiple hours and traffic spikes you may be able to spot a reoccuring IP address and potentially trace to a specific robot/crawler. If this does turn out to be the case you can look into using robots.txt to limit how well behaving robots should crawl your site.

Edit: On a busy server the above netstat command should show some entries like:

tcp        0      0 10.2.212.13:80              216.146.52.21:24979         ESTABLISHED
tcp        0      0 10.2.212.13:80              86.174.113.138:54901        ESTABLISHED
tcp        0      0 10.2.212.13:80              94.1.216.253:51204          ESTABLISHED
tcp        0      0 10.2.212.13:80              24.9.61.204:62936           ESTABLISHED

The client's IP address will be the one on the right. If you only see 1 or 2 lines it just means that at that moment there is just your ssh connection. Check again when your load increases. You can also remove the grep to list all connections although this will include a large number of old TIME_WAIT.

I would start with the extended server-status and see if that can reveal any obvious crawlers during traffic peaks.