Linux – Random high CPU usage on Linux using Apache

apache-2.2cpu-usagelinux

I've recently had a bit of a strange problem with my web server. Over the last day or so the site seems to be slowing down somewhat at random intervals, we don't seem to be experiencing any major extra traffic, however a quick look at 'top' and httpd seems to be jumping from between 3-10% to around 99%, then briefly hitting around mid 80's then going back down. For example:

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
2443 apache    25   0  256m  20m 5472 R 88.2  2.1   3:22.29 httpd

This seems to happen every 30 minutes or so. The strange thing is at the same time this is happening I can run the Apache server-status page and will get (for example):

CPU Usage: u700.5 s6.22 cu0 cs0 - 20.2% CPU load

So my question is two-fold:

Does anyone know why this issue may have cropped up over the last day or so (there have been no changes made to the server)
Why would my CPU usage stats in top be vastly higher than server-status and which is correct?

Best Answer

The CPU usage from Apache's server-status page is the average usage since Apache was started so it won't show spikes like this. When you get these load spikes you can check the server-status page to see what pages/clients are being server (ExtendedStatus must be on).

You can also use netstat to see what clients are currently accessing your machine:

 netstat -an | grep ESTABLISHED

If you run this over multiple hours and traffic spikes you may be able to spot a reoccuring IP address and potentially trace to a specific robot/crawler. If this does turn out to be the case you can look into using robots.txt to limit how well behaving robots should crawl your site.

Edit: On a busy server the above netstat command should show some entries like:

tcp        0      0 10.2.212.13:80              216.146.52.21:24979         ESTABLISHED
tcp        0      0 10.2.212.13:80              86.174.113.138:54901        ESTABLISHED
tcp        0      0 10.2.212.13:80              94.1.216.253:51204          ESTABLISHED
tcp        0      0 10.2.212.13:80              24.9.61.204:62936           ESTABLISHED

The client's IP address will be the one on the right. If you only see 1 or 2 lines it just means that at that moment there is just your ssh connection. Check again when your load increases. You can also remove the grep to list all connections although this will include a large number of old TIME_WAIT.

I would start with the extended server-status and see if that can reveal any obvious crawlers during traffic peaks.

Related Solutions

Linux CPU Usage – Monitoring CPU Usage and Process Execution History

There are a couple of possible ways you can do this. Note that its entirely possible its many processes in a runaway scenario causing this, not just one.

The first way is to setup pidstat to run in the background and produce data.

pidstat -u 600 >/var/log/pidstats.log & disown $!

This will give you a quite detailed outlook of the running of the system at ten minute intervals. I would suggest this be your first port of call since it produces the most valuable/reliable data to work with.

There is a problem with this, primarily if the box goes into a runaway cpu loop and produces huge load -- your not guaranteed that your actual process will execute in a timely manner during load (if at all) so you could actually miss the output!

The second way to look for this is to enable process accounting. Possibly more of a long term option.

accton on

This will enable process accounting (if not already added). If it was not running before this will need time to run.

Having been ran, for say 24 hours - you can then run such a command (which will produce output like this)

# sa --percentages --separate-times
     108  100.00%       7.84re  100.00%       0.00u  100.00%       0.00s  100.00%         0avio     19803k
       2    1.85%       0.00re    0.05%       0.00u   75.00%       0.00s    0.00%         0avio     29328k   troff
       2    1.85%       0.37re    4.73%       0.00u   25.00%       0.00s   44.44%         0avio     29632k   man
       7    6.48%       0.00re    0.01%       0.00u    0.00%       0.00s   44.44%         0avio     28400k   ps
       4    3.70%       0.00re    0.02%       0.00u    0.00%       0.00s   11.11%         0avio      9753k   ***other*
      26   24.07%       0.08re    1.01%       0.00u    0.00%       0.00s    0.00%         0avio      1130k   sa
      14   12.96%       0.00re    0.01%       0.00u    0.00%       0.00s    0.00%         0avio     28544k   ksmtuned*
      14   12.96%       0.00re    0.01%       0.00u    0.00%       0.00s    0.00%         0avio     28096k   awk
      14   12.96%       0.00re    0.01%       0.00u    0.00%       0.00s    0.00%         0avio     29623k   man*
       7    6.48%       7.00re   89.26%       0.00u    0.00%       0.00s

The columns are ordered as such:

Number of calls
Percentage of calls
Amount of real time spent on all the processes of this type.
Percentage.
User CPU time
Percentage
System CPU time.
Average IO calls.
Percentage
Command name

What you'll be looking for is the process types that generate the most User/System CPU time.

This breaks down the data as the total amount of CPU time (the top row) and then how that CPU time has been split up. Process accounting only accounts properly when its on when processes spawn, so its probably best to restart the system after enabling it to ensure all services are being accounted for.

This, by no means actually gives you a definite idea what process it might be that is the cause of this problem, but might give you good feel. As it could be a 24 hour snapshot theres a possibility of skewed results so bear that in mind. It also should always log since its a kernel feature and unlike pidstat will always produce output even during heavy load.

The last option available also uses process accounting so you can turn it on as above, but then use the program "lastcomm" to produce some statistics of processes executed around the time of the problem along with cpu statistics for each process.

lastcomm | grep "May  8 22:[01234]"
kworker/1:0       F    root     __         0.00 secs Tue May  8 22:20
sleep                  root     __         0.00 secs Tue May  8 22:49
sa                     root     pts/0      0.00 secs Tue May  8 22:49
sa                     root     pts/0      0.00 secs Tue May  8 22:49
sa                   X root     pts/0      0.00 secs Tue May  8 22:49
ksmtuned          F    root     __         0.00 secs Tue May  8 22:49
awk                    root     __         0.00 secs Tue May  8 22:49

This might give you some hints too as to what might be causing the problem.

Php – Apache high CPU usage, setting a overload limit

What are the typical request rates for PHP generated and static content? Have you checked the hit rate in APC for caching and for opcode? What version of PHP?

show a custom message

The sensible place to do this would be on a reverse proxy - but you say you've not go this in place yet. Another approach would be to run a minimal webserver and load balancer on the current box (in addition to the main content) and redirect that way - but that's even more work than getting Varnish set up.

Similarly wrapping the front end in a proxy script would have the desired effect - but again, the effort is more than implementing Varnish.

Best Answer

Related Solutions

Linux CPU Usage – Monitoring CPU Usage and Process Execution History

Php – Apache high CPU usage, setting a overload limit

Related Topic