Centos dedicated Server unresponsive for the first time

centosdedicated-server

server was unresponsive for an hour so i rebooted it and checked /var/log/messages

and found this. can anybody point out whats wrong ?

Sep 28 07:39:35 www kernel: INFO: task mysqld:22749 blocked for more than 120 seconds.
Sep 28 07:39:35 www kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 07:39:35 www kernel: mysqld        D ffff810001015120     0 22749   3266         22792 22659 (NOTLB)
Sep 28 07:39:35 www kernel:  ffff810139d21e58 0000000000000086 ffff810036217000 ffffffff8000f758
Sep 28 07:39:35 www kernel:  ffff81020dfd1408 0000000000000007 ffff8101cfbaf7e0 ffff81020fca5080
Sep 28 07:39:35 www kernel:  00017a451524782a 00000000000043b2 ffff8101cfbaf9c8 0000000280009a22
Sep 28 07:39:35 www kernel: Call Trace:
Sep 28 07:39:35 www kernel:  [<ffffffff8000f758>] generic_permission+0x52/0xca
Sep 28 07:39:35 www kernel:  [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
Sep 28 07:39:35 www kernel:  [<ffffffff8000cea2>] do_path_lookup+0x294/0x310
Sep 28 07:39:35 www kernel:  [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
Sep 28 07:39:35 www kernel:  [<ffffffff8003c618>] do_unlinkat+0x66/0x141
Sep 28 07:39:35 www kernel:  [<ffffffff8005d229>] tracesys+0x71/0xe0
Sep 28 07:39:57 www kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Sep 28 07:39:58 www kernel: 
Sep 28 07:39:59 www kernel: INFO: task httpd:22679 blocked for more than 120 seconds.
Sep 28 07:40:04 www kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 07:40:08 www kernel: httpd         D ffff81000100caa0     0 22679  22413         22680 22678 (NOTLB)
Sep 28 07:40:51 www kernel:  ffff81018b0dbc78 0000000000000086 ffff81018b0dbc88 0000004480063002
Sep 28 07:41:52 www kernel:  ffff81000001cc00 0000000000000007 ffff81013ac5e860 ffff81020fc96100
Sep 28 07:43:10 www kernel:  00017a44de6376c8 000000000000a89f ffff81013ac5ea48 000000010001cc00
Sep 28 07:43:38 www kernel: Call Trace:
Sep 28 07:44:06 www kernel:  [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
Sep 28 07:44:09 www kernel:  [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
Sep 28 07:44:10 www kernel:  [<ffffffff8000d0b2>] do_lookup+0x90/0x1e6
Sep 28 07:44:13 www kernel:  [<ffffffff8000a2e9>] __link_path_walk+0xa3a/0xfd1
Sep 28 07:44:16 www kernel:  [<ffffffff8000eb8e>] link_path_walk+0x45/0xb8
Sep 28 07:44:16 www kernel:  [<ffffffff8000cea2>] do_path_lookup+0x294/0x310
Sep 28 07:44:29 www kernel:  [<ffffffff800129ad>] getname+0x15b/0x1c2
Sep 28 07:44:38 www kernel:  [<ffffffff80023b60>] __user_walk_fd+0x37/0x4c
Sep 28 07:44:42 www kernel:  [<ffffffff80028ada>] vfs_stat_fd+0x1b/0x4a
Sep 28 07:44:43 www kernel:  [<ffffffff8003c69a>] do_unlinkat+0xe8/0x141
Sep 28 07:45:02 www kernel:  [<ffffffff80023890>] sys_newstat+0x19/0x31
Sep 28 07:46:18 www kernel:  [<ffffffff8005d229>] tracesys+0x71/0xe0
Sep 28 07:46:43 www kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Sep 28 07:46:55 www kernel: 
Sep 28 07:46:58 www kernel: INFO: task php:28906 blocked for more than 120 seconds.
Sep 28 07:46:59 www kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 07:47:00 www kernel: php           D ffff810165127000     0 28906  28905                     (NOTLB)
Sep 28 07:47:37 www kernel:  ffff810078431e58 0000000000000082 ffff810165127000 ffffffff8000f758
Sep 28 07:48:29 www kernel:  ffff81020dfd1408 0000000000000007 ffff8101247b9860 ffff810207d0e100
Sep 28 07:48:36 www kernel:  00017a4218932fae 0000000000377111 ffff8101247b9a48 0000000280009a22
Sep 28 07:48:37 www kernel: Call Trace:
Sep 28 07:48:37 www kernel:  [<ffffffff8000f758>] generic_permission+0x52/0xca
Sep 28 07:48:37 www kernel:  [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
Sep 28 07:48:37 www kernel:  [<ffffffff8000cea2>] do_path_lookup+0x294/0x310
Sep 28 07:48:41 www kernel:  [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
Sep 28 07:48:41 www kernel:  [<ffffffff8003c618>] do_unlinkat+0x66/0x141
Sep 28 07:48:42 www kernel:  [<ffffffff8005d229>] tracesys+0x71/0xe0
Sep 28 07:48:42 www kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Sep 28 07:48:42 www kernel: 
Sep 28 07:48:43 www kernel: INFO: task php:29032 blocked for more than 120 seconds.
Sep 28 07:48:45 www kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 07:48:46 www kernel: php           D 0000000000000004     0 29032      1         29050 29024 (NOTLB)
Sep 28 07:48:46 www kernel:  ffff81006b465dc8 0000000000000086 ffff81020dfd1408 ffffffff80009a22
Sep 28 07:48:46 www kernel:  0000000000000000 0000000000000007 ffff81002946e860 ffff81003c943100
Sep 28 07:48:46 www kernel:  00017a4211450766 000000000024be3d ffff81002946ea48 000000020e42b300
Sep 28 07:48:52 www kernel: Call Trace:
Sep 28 07:48:54 www kernel:  [<ffffffff80009a22>] __link_path_walk+0x173/0xfd1
Sep 28 07:48:54 www kernel:  [<ffffffff8002cc58>] mntput_no_expire+0x19/0x89
Sep 28 07:48:55 www kernel:  [<ffffffff8000ebf5>] link_path_walk+0xac/0xb8
Sep 28 07:48:55 www kernel:  [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
Sep 28 07:48:55 www kernel:  [<ffffffff80023974>] __path_lookup_intent_open+0x56/0x97
Sep 28 07:48:55 www kernel:  [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
Sep 28 07:48:55 www kernel:  [<ffffffff8001b260>] open_namei+0xea/0x718
Sep 28 07:48:59 www kernel:  [<ffffffff80067235>] do_page_fault+0x4cc/0x842
Sep 28 07:49:01 www kernel:  [<ffffffff80027726>] do_filp_open+0x1c/0x38
Sep 28 07:49:01 www kernel:  [<ffffffff8001a09c>] do_sys_open+0x44/0xbe
Sep 28 07:49:02 www kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Sep 28 07:49:03 www kernel: 
Sep 28 07:49:07 www kernel: INFO: task mysqld:22749 blocked for more than 120 seconds.
Sep 28 07:49:09 www kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 07:49:09 www kernel: mysqld        D ffff810001015120     0 22749   3266         22792 22659 (NOTLB)
Sep 28 07:49:14 www kernel:  ffff810139d21e58 0000000000000086 ffff810036217000 ffffffff8000f758
Sep 28 07:49:14 www kernel:  ffff81020dfd1408 0000000000000007 ffff8101cfbaf7e0 ffff81020fca5080
Sep 28 07:49:15 www kernel:  00017a451524782a 00000000000043b2 ffff8101cfbaf9c8 0000000280009a22
Sep 28 07:49:15 www kernel: Call Trace:
Sep 28 07:49:22 www kernel:  [<ffffffff8000f758>] generic_permission+0x52/0xca
Sep 28 07:49:23 www kernel:  [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
Sep 28 07:49:23 www kernel:  [<ffffffff8000cea2>] do_path_lookup+0x294/0x310
Sep 28 07:49:23 www kernel:  [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
Sep 28 07:49:23 www kernel:  [<ffffffff8003c618>] do_unlinkat+0x66/0x141
Sep 28 07:49:23 www kernel:  [<ffffffff8005d229>] tracesys+0x71/0xe0
Sep 28 07:49:23 www kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Sep 28 07:49:23 www kernel: 
Sep 28 07:49:23 www kernel: INFO: task php:29024 blocked for more than 120 seconds.
Sep 28 07:49:23 www kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 07:49:24 www kernel: php           D ffff8101920a0000     0 29024      1         29032 29001 (NOTLB)
Sep 28 07:49:26 www kernel:  ffff8101cca8fe58 0000000000000086 ffff8101920a0000 ffffffff8000f758
Sep 28 07:49:26 www kernel:  ffff81020dfd1408 0000000000000007 ffff81000b64b040 ffff8101e05337e0
Sep 28 07:49:26 www kernel:  00017a552aef9f35 0000000000009513 ffff81000b64b228 0000000180009a22
Sep 28 07:49:27 www kernel: Call Trace:
Sep 28 07:49:27 www kernel:  [<ffffffff8000f758>] generic_permission+0x52/0xca
Sep 28 07:49:27 www kernel:  [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
Sep 28 07:49:27 www kernel:  [<ffffffff8000cea2>] do_path_lookup+0x294/0x310
Sep 28 07:49:27 www kernel:  [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
Sep 28 07:49:27 www kernel:  [<ffffffff8003c618>] do_unlinkat+0x66/0x141
Sep 28 07:49:27 www kernel:  [<ffffffff8005d229>] tracesys+0x71/0xe0
Sep 28 07:49:27 www kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Sep 28 07:49:27 www kernel: 
Sep 28 07:49:27 www kernel: INFO: task php:29050 blocked for more than 120 seconds.
Sep 28 07:49:28 www kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 07:49:28 www kernel: php           D ffff810201d95000     0 29050      1               29032 (NOTLB)
Sep 28 07:49:28 www kernel:  ffff810051e45e58 0000000000000086 ffff810201d95000 ffffffff8000f758
Sep 28 07:49:28 www kernel:  ffff81020dfd1408 0000000000000007 ffff81001c23f080 ffff81020f5e2080
Sep 28 07:49:29 www kernel:  00017a5d0bc2aa75 0000000000d0ecfe ffff81001c23f268 0000000280009a22
Sep 28 07:49:29 www kernel: Call Trace:
Sep 28 07:49:29 www kernel:  [<ffffffff8000f758>] generic_permission+0x52/0xca
Sep 28 07:49:29 www kernel:  [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
Sep 28 07:49:29 www kernel:  [<ffffffff8000cea2>] do_path_lookup+0x294/0x310
Sep 28 07:49:34 www kernel:  [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
Sep 28 07:49:35 www kernel:  [<ffffffff8003c618>] do_unlinkat+0x66/0x141
Sep 28 07:49:37 www kernel:  [<ffffffff8005d229>] tracesys+0x71/0xe0
Sep 28 07:49:37 www kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Sep 28 07:49:37 www kernel: 
Sep 28 07:49:37 www kernel: INFO: task php:29064 blocked for more than 120 seconds.
Sep 28 07:49:37 www kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 07:49:37 www kernel: php           D ffff81009c231000     0 29064  29057                     (NOTLB)
Sep 28 07:49:38 www kernel:  ffff8100a5dc7e58 0000000000000086 ffff81009c231000 ffffffff8000f758
Sep 28 07:49:38 www kernel:  ffff81020dfd1408 0000000000000007 ffff81000a850820 ffff8102038037a0
Sep 28 07:49:38 www kernel:  00017a5bb5c6846e 000000000000861a ffff81000a850a08 0000000080009a22
Sep 28 07:49:38 www kernel: Call Trace:
Sep 28 07:49:38 www kernel:  [<ffffffff8000f758>] generic_permission+0x52/0xca
Sep 28 07:49:38 www kernel:  [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
Sep 28 07:49:38 www kernel:  [<ffffffff8000cea2>] do_path_lookup+0x294/0x310
Sep 28 07:49:38 www kernel:  [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
Sep 28 07:49:38 www kernel:  [<ffffffff8003c618>] do_unlinkat+0x66/0x141
Sep 28 07:49:38 www kernel:  [<ffffffff8005d229>] tracesys+0x71/0xe0
Sep 28 07:49:40 www kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Sep 28 07:49:42 www kernel: 
Sep 28 07:49:42 www kernel: INFO: task mysqld:24612 blocked for more than 120 seconds.
Sep 28 07:49:43 www kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 07:49:46 www kernel: mysqld        D ffff81020dfd14c0     0 24612   3266         19643  3599 (NOTLB)
Sep 28 07:49:46 www kernel:  ffff81019e517c78 0000000000000086 ffff81019e517c88 ffffffff80063002
Sep 28 07:49:47 www kernel:  ffff810201966558 0000000000000009 ffff81015fa560c0 ffff8101c263b860
Sep 28 07:49:51 www kernel:  00017a9d113e27fe 0000000000008d5a ffff81015fa562a8 000000018006ec9f
Sep 28 07:49:52 www kernel: Call Trace:
Sep 28 07:49:52 www kernel:  [<ffffffff80063002>] thread_return+0x62/0xfe
Sep 28 07:49:52 www kernel:  [<ffffffff8005a46a>] getnstimeofday+0x10/0x29
Sep 28 07:49:53 www kernel:  [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
Sep 28 07:49:54 www kernel:  [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
Sep 28 07:49:54 www kernel:  [<ffffffff8000d0b2>] do_lookup+0x90/0x1e6
Sep 28 07:49:56 www kernel:  [<ffffffff8000a2e9>] __link_path_walk+0xa3a/0xfd1
Sep 28 07:50:00 www kernel:  [<ffffffff8000eb8e>] link_path_walk+0x45/0xb8
Sep 28 07:50:03 www kernel:  [<ffffffff8000cea2>] do_path_lookup+0x294/0x310
Sep 28 07:50:04 www kernel:  [<ffffffff800129ad>] getname+0x15b/0x1c2
Sep 28 07:50:06 www kernel:  [<ffffffff80023b60>] __user_walk_fd+0x37/0x4c
Sep 28 07:50:06 www kernel:  [<ffffffff8003f013>] vfs_lstat_fd+0x18/0x47
Sep 28 07:50:08 www kernel:  [<ffffffff8002ad91>] sys_newlstat+0x19/0x31
Sep 28 07:50:10 www kernel:  [<ffffffff8005d229>] tracesys+0x71/0xe0
Sep 28 07:50:15 www kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Sep 28 07:50:19 www kernel: 
Sep 28 07:50:19 www kernel: INFO: task php:29178 blocked for more than 120 seconds.
Sep 28 07:50:23 www kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 07:50:23 www kernel: php           D 0000000000000003     0 29178  29123                     (NOTLB)
Sep 28 07:50:23 www kernel:  ffff81004a95bdc8 0000000000000086 ffff81020dfd1408 ffffffff80009a22
Sep 28 07:50:24 www kernel:  ffffffff800a2fd0 0000000000000007 ffff8101937a4040 ffff81010bde27a0
Sep 28 07:50:26 www kernel:  00017aa3a1d89c9b 000000000000d66e ffff8101937a4228 000000020e42b300
Sep 28 07:50:26 www kernel: Call Trace:
Sep 28 07:50:26 www kernel:  [<ffffffff80009a22>] __link_path_walk+0x173/0xfd1
Sep 28 07:50:27 www kernel:  [<ffffffff800a2fd0>] wake_bit_function+0x0/0x23
Sep 28 07:50:27 www kernel:  [<ffffffff8002cc58>] mntput_no_expire+0x19/0x89
Sep 28 07:50:27 www kernel:  [<ffffffff8000ebf5>] link_path_walk+0xac/0xb8
Sep 28 07:50:28 www kernel:  [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
Sep 28 07:50:32 www kernel:  [<ffffffff80023974>] __path_lookup_intent_open+0x56/0x97
Sep 28 07:50:32 www kernel:  [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
Sep 28 07:50:34 www kernel:  [<ffffffff8001b260>] open_namei+0xea/0x718
Sep 28 07:50:34 www kernel:  [<ffffffff80067235>] do_page_fault+0x4cc/0x842
Sep 28 07:50:35 www kernel:  [<ffffffff80027726>] do_filp_open+0x1c/0x38
Sep 28 07:50:35 www kernel:  [<ffffffff8001a09c>] do_sys_open+0x44/0xbe
Sep 28 07:50:35 www kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Sep 28 07:50:35 www kernel: 
Sep 28 07:56:41 www kernel: proftpd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Sep 28 07:56:41 www kernel: 
Sep 28 07:56:41 www kernel: Call Trace:
Sep 28 07:56:41 www kernel:  [<ffffffff800c9f35>] out_of_memory+0x8e/0x2f3
Sep 28 07:56:41 www kernel:  [<ffffffff800a2fa2>] autoremove_wake_function+0x0/0x2e
Sep 28 07:56:41 www kernel:  [<ffffffff8000f67d>] __alloc_pages+0x27f/0x308
Sep 28 07:56:41 www kernel:  [<ffffffff80013047>] __do_page_cache_readahead+0x96/0x17b
Sep 28 07:56:41 www kernel:  [<ffffffff80013984>] filemap_nopage+0x14c/0x360
Sep 28 07:56:41 www kernel:  [<ffffffff80008972>] __handle_mm_fault+0x1fd/0x103b
Sep 28 07:56:41 www kernel:  [<ffffffff800a4fe1>] ktime_get_ts+0x1a/0x4e
Sep 28 07:56:41 www kernel:  [<ffffffff80067202>] do_page_fault+0x499/0x842
Sep 28 07:56:41 www kernel:  [<ffffffff8003ad91>] hrtimer_try_to_cancel+0x4a/0x53
Sep 28 07:58:10 www kernel:  [<ffffffff80033541>] do_setitimer+0xd0/0x689
Sep 28 08:26:22 www syslogd 1.4.1: restart.
Sep 28 08:26:22 www kernel: klogd 1.4.1, log source = /proc/kmsg started.
Sep 28 08:26:22 www kernel: Linux version 2.6.18-274.17.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-51)) #1 SMP Tue Jan 10 17:25:58 EST 2012
Sep 28 08:26:22 www kernel: Command line: ro root=LABEL=/

Best Answer

As Safado says, you ran out of memory resulting in the oom-killer kicking and and closing things down. I had this problem too.

I took the following actions:

  • Increased amount of swap available, so oom-killer would not get invoked as quickly
  • Setup monit to alert me when memory started to run out
  • Setup munin so I could check on memory use and see any trends

This enabled me to login to the server when things were starting to look shaky, and check what was using all the memory.

In my case it was Apache. I reconfigured it to reduce the number of spare threads and servers, and the problems went away.

The main point though is when something like this happens to you, monitoring will really help.

Related Topic