Centos – the server suddenly crashes every 2 days or so. Programmer has no idea, please help find the cause, here is the top


Every couple of days my server suddenly crashes and I must request hardware reset at data center to get it back running.

Today I came back to my shell and saw the server was dead and "top" was running on it, and see below for the "top" right before the crash.

I opened /var/log/messages and scrolled to the reboot time and see nothing, no errors prior to the hard reboot. (I checked in /etc/syslog.conf and I see "*.info;mail.none;authpriv.none;cron.none /var/log/messages" , isn't this good enough to log all problems?)

Usually when I look at the top, the swap is never used up like this! I also don't know why mysqld is at 323% cpu (server only runs drupal and its never slow or overloaded). Solver is my application. I don't know whats that 'sh' doing and 'dovecot' doing.

Its driving me crazy over the last month, please help me solve this mystery and stop my downtimes.

top - 01:10:06 up 6 days, 5 min,  3 users,  load average: 34.87, 18.68, 9.03
Tasks: 500 total,  19 running, 481 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us, 96.6%sy,  0.0%ni,  1.7%id,  1.8%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8165600k total,  8139764k used,    25836k free,      428k buffers
Swap:  2104496k total,  2104496k used,        0k free,     8236k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                            
 4421 mysql     15   0  571m 105m  976 S 323.5  1.3   9:08.00 mysqld                                                                                                                                                                                            
  564 root      20  -5     0    0    0 R 99.5  0.0   2:49.16 kswapd1                                                                                                                                                                                            
25767 apache    19   0  399m 8060  888 D 79.3  0.1   0:06.64 httpd                                                                                                                                                                                              
25781 apache    19   0  398m 5648  492 R 79.0  0.1   0:08.21 httpd                                                                                                                                                                                              
25961 apache    25   0  398m 5700  560 R 76.7  0.1   0:17.81 httpd                                                                                                                                                                                              
25980 apache    25   0 10816  668  520 R 75.0  0.0   0:46.95 sh                                                                                                                                                                                                 
  563 root      20  -5     0    0    0 D 71.4  0.0   3:12.37 kswapd0                                                                                                                                                                                            
25766 apache    25   0  399m 7256  756 R 69.7  0.1   0:39.83 httpd                                                                                                                                                                                              
25911 apache    25   0  398m 5612  480 R 58.8  0.1   0:17.63 httpd                                                                                                                                                                                              
25782 apache    25   0  440m  38m  648 R 55.2  0.5   0:18.94 httpd                                                                                                                                                                                              
25966 apache    25   0  398m 5640  556 R 55.2  0.1   0:48.84 httpd                                                                                                                                                                                              
 4588 root      25   0 74860  596  476 R 53.9  0.0   0:37.90 crond                                                                                                                                                                                              
25939 apache    25   0  2776  172   84 R 48.9  0.0   0:59.46 solver                                                                                                                                                                                             
 4575 root      25   0  397m 6004 1144 R 48.6  0.1   1:00.43 httpd                                                                                                                                                                                              
25962 apache    25   0  398m 5628  492 R 47.9  0.1   0:14.58 httpd                                                                                                                                                                                              
25824 apache    25   0  440m  39m  680 D 47.3  0.5   0:57.85 httpd                                                                                                                                                                                              
25968 apache    25   0  398m 5612  528 R 46.6  0.1   0:42.73 httpd                                                                                                                                                                                              
 4477 root      25   0  6084  396  280 R 46.3  0.0   0:59.53 dovecot                                                                                                                                                                                            
25982 root      25   0  397m 5108  240 R 45.9  0.1   0:18.01 httpd                                                                                                                                                                                              
25943 apache    25   0  2916  172    8 R 44.0  0.0   0:53.54 solver                                                                                                                                                                                             
30687 apache    25   0  468m  63m 1124 D 42.3  0.8   0:45.02 httpd                                                                                                                                                                                              
25978 apache    25   0  398m 5688  600 R 23.8  0.1   0:40.99 httpd                                                                                                                                                                                              
25983 root      25   0  397m 5272  384 D 14.9  0.1   0:18.99 httpd                                                                                                                                                                                              
  935 root      10  -5     0    0    0 D 14.2  0.0   1:54.60 kjournald                                                                                                                                                                                          
25986 root      25   0  397m 5308  420 D  8.9  0.1   0:04.75 httpd                                                                                                                                                                                              
 4011 haldaemo  25   0 31568 1476  716 S  5.6  0.0   0:24.36 hald                                                                                                                                                                                               
25956 apache    23   0  398m 5872  644 S  5.6  0.1   0:13.85 httpd                                                                                                                                                                                              
18336 root      18   0 13004 1332  724 R  0.3  0.0   1:46.66 top                                                                                                                                                                                                
    1 root      18   0 10372  212  180 S  0.0  0.0   0:05.99 init                                                                                                                                                                                               
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.95 migration/0                                                                                                                                                                                        
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/0                                                                                                                                                                                        
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0                                                                                                                                                                                         
    5 root      RT  -5     0    0    0 S  0.0  0.0   0:00.15 migration/1                                                                                                                                                                                        
    6 root      34  19     0    0    0 S  0.0  0.0   0:00

.06 ksoftirqd/1

here is a normal top, when server is working fine:

top - 01:50:41 up 21 min,  1 user,  load average: 2.98, 2.70, 1.68
Tasks: 271 total,   2 running, 269 sleeping,   0 stopped,   0 zombie
Cpu(s): 15.0%us,  1.1%sy,  0.0%ni, 81.4%id,  2.4%wa,  0.1%hi,  0.0%si,  0.0%st
Mem:   8165600k total,  2035856k used,  6129744k free,    60840k buffers
Swap:  2104496k total,        0k used,  2104496k free,   283744k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                            
 2204 apache    17   0  466m  83m  19m S 25.9  1.0   0:22.16 httpd                                                                                                                                                                                              
11347 apache    15   0  466m  83m  19m S 25.9  1.0   0:26.10 httpd                                                                                                                                                                                              
18204 apache    18   0  481m  97m  19m D 25.2  1.2   0:13.99 httpd                                                                                                                                                                                              
 4644 apache    18   0  481m 100m  19m D 24.6  1.3   1:17.12 httpd                                                                                                                                                                                              
 4727 apache    17   0  481m  99m  19m S 24.3  1.2   1:10.77 httpd                                                                                                                                                                                              
 4777 apache    17   0  482m 102m  21m S 23.6  1.3   1:38.27 httpd                                                                                                                                                                                              
 8924 apache    15   0  483m  99m  19m S 22.3  1.3   1:13.41 httpd                                                                                                                                                                                              
 9390 apache    18   0  483m  99m  19m S 18.9  1.2   1:05.35 httpd                                                                                                                                                                                              
 4728 apache    16   0  481m 101m  19m S 14.3  1.3   1:12.50 httpd                                                                                                                                                                                              
 4648 apache    15   0  481m 107m  27m S 12.6  1.4   1:18.62 httpd                                                                                                                                                                                              
24955 apache    15   0  467m  82m  19m S  3.3  1.0   0:21.80 httpd                                                                                                                                                                                              
 4722 apache    15   0  503m 118m  19m R  1.7  1.5   1:17.79 httpd                                                                                                                                                                                              
 4647 apache    15   0  484m 105m  20m S  1.3  1.3   1:40.73 httpd                                                                                                                                                                                              
 4643 apache    16   0  481m 100m  20m S  0.7  1.3   1:11.80 httpd                                                                                                                                                                                              
 1561 root      15   0 12900 1264  828 R  0.3  0.0   0:00.54 top                                                                                                                                                                                                
 4434 mysql     15   0  496m  55m 4812 S  0.3  0.7   0:06.69 mysqld                                                                                                                                                                                             
 4646 apache    15   0  481m 100m  19m S  0.3  1.3   1:25.51 httpd                                                                                                                                                                                              
    1 root      18   0 10372  692  580 S  0.0  0.0   0:02.09 init                                                                                                                                                                                               
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.03 migration/0                                                                                                                                                                                        
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0                                                                                                                                                                                        
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0                                                                                                                                                                                         
    5 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/1                                                                                                                                                                                        
    6 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/1                                                                                                                                                                                        
    7 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/1                                                                                                                                                                                         
    8 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/2                                                                                                                                                                                        
    9 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/2                                                                                                                                                                                        
   10 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/2                                                                                                                                                                                         
   11 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/3                                                                                                                                                                                        
   12 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/3                                                                                                                                                                                        
   13 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/3                                                                                                                                                                                         
   14 root      RT  -5     0    0    0 S  0.0  0.0   0:00.03 migration/4                                                                                                                                                                                        
   15 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/4                                                                                                                                                                                        
   16 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/4                                                                                                                                                                                         
   17 root      RT  -5     0    0    0 S  0.0  0.0   0:00.02 migration/5                                                                                                                                                                                        
   18 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/5                                                                                                                                                                                        
   19 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/5                                                                                                                                                                                         
   20 root      RT  -5     0    0    0 S  0.0  0.0   0:00.01 migration/6                                                                                                                                                                                        
   21 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/6                                                                                                                                                                                        
   22 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/6                                                                                                                                                                                         
   23 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/7    

Best Answer

My guess is that your system is swapping itself to death because of waiting web requests when the database locks. You probably have one or two queries that run sporadically - possibly from a cronjob - that cause one of the database tables that is frequently used to lock. Once it does, all the queries start backing up behind it until the system starts swapping. Once that starts happening, it's the end of it.

Check your slow log and check for periodic queries that run within a few hours of when the crashes usually occur.

Related Topic