Sudden high CPU, causing server to be unresponsive

apache-2.2performance

I run a reasonably busy (700,000 page views/day, php/mysql) site that gets steady traffic (normally no spikes). The last two days, around peak usage time, and for about an hour, my site had suddenly gone from being very fast to unresponsive, for about an hour, and then back to being super fast.

The CPU load jumps dramatically at 2:10AM :

12:00:01 AM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
12:10:01 AM         1       270      2.54      3.56      4.00
12:20:01 AM        10       270      5.58      5.09      4.61
12:30:01 AM         9       297     10.06      9.63      7.22
12:40:01 AM         7       296      3.42      5.17      6.15
12:50:02 AM         8       291      4.36      4.57      5.43
01:00:02 AM        20       297      9.38      7.57      6.49
01:10:01 AM         6       279      5.83      6.86      6.90
01:20:01 AM        11       263      5.77      5.43      5.98
01:30:01 AM         2       291      6.70      5.56      5.66
01:40:01 AM         2       285      3.73      5.09      5.37
01:50:01 AM         6       285      3.84      4.65      5.11
02:00:01 AM         8       283      2.56      3.72      4.45
02:10:01 AM         2       431     14.67     10.88      7.34
02:20:01 AM         1       425      7.10     11.48      9.73
02:30:01 AM         4       453     10.30     12.79     11.23
02:40:01 AM         2       440     14.12     16.13     13.41

Here are my stats :

Hostgator VPS Level 7, 2 x 2GHz CPU, 3.2G RAM, CentOS 5.9, Apache 2.2.19, MySQL

  • Mysql did not show any abnormal load during this time
  • Apache was showing all workers in "W" state.
  • Rebooting, restarting mysql, restarting apache all did not resolve the issue
  • Nothing abnormal in apache error log (except lots of 503 errors during this time)

I'm really not sure where to start investigating this issue. I'd appreciate any pointers with :

1 – how to fully diagnose this issue now
2 – or what tools to install/ commands to run to capture extra data when it happens again.

thanks in advance.

Best Answer

How to diagnose: - Plot the graphs. Use munin, cacti or other external monitoring system to get to know, what exactly kind of resource has ended. - Use atop or sar to get detailed information about processes activity in timeline. When you servers goes down, check dumps moving backward.

Related Topic