Linux – Cannot remove or list directory (ls or rm) Linux – Debian

bashcommand-line-interfacelinuxlsrm

I have a log file directory at /var/log/apache2/vhosts on a Debian dedicated server. Every day I split the main access_log into respective vhostname.com_access_log files in /var/log/apache2/vhosts

In the last week or so I have noticed the server has become unresponsive and every time this happens there seems to be multiple log rotate instances running.

After some investigation I decided to check the contents of /var/log/apache2/vhosts to see if there was some sort of problem. I can 'cd' into the directory but cannot list its contents. The 'ls' command does not die it just doesn't do anything.

After trying various different ways to list the contents of the file I decided to just delete it and create a new file. Running rm on the directory from the parent directory didn't seem to work either however: rm -rf vhosts. From a view of 'top' I can see 'rm' is running and using a small amount of cpu… It has now been running for about an hour.

Any ideas what is going on and how I can get rid of that directory?

FURTHER INFORMATION:

Seeing a lot of this type error in /var/log/messages don't know if it is related:

    Oct 15 12:22:39 sp5059b kernel: [2257339.255530] apache2       D a9b86504     0 32217   9125
    Oct 15 12:22:39 sp5059b kernel: [2257339.255533]        dea15620 00000086 00000000 a9b86504 0007f05b dea157ac c1fdbfc0 00000001 
    Oct 15 12:22:39 sp5059b kernel: [2257339.255538]        00000000 00000000 00000001 00000000 00000000 00000000 00000000 00000000 
    Oct 15 12:22:39 sp5059b kernel: [2257339.255542]        f7246ec0 f7246ec8 f7246ec4 dea15620 c02b95c6 c2483e80 d7085e80 dea15620 
    Oct 15 12:22:39 sp5059b kernel: [2257339.255547] Call Trace:
    Oct 15 12:22:39 sp5059b kernel: [2257339.255553]  [<c02b95c6>] __mutex_lock_slowpath+0x50/0x7b
    Oct 15 12:22:39 sp5059b kernel: [2257339.255557]  [<c02b945c>] mutex_lock+0xa/0xb
    Oct 15 12:22:39 sp5059b kernel: [2257339.255561]  [<c015810c>] generic_file_aio_write+0x41/0xa9
    Oct 15 12:22:39 sp5059b kernel: [2257339.255565]  [<f892ef99>] ext3_file_write+0x19/0x83 [ext3]
    Oct 15 12:22:39 sp5059b kernel: [2257339.255575]  [<c017460e>] do_sync_write+0xbf/0x100
    Oct 15 12:22:39 sp5059b kernel: [2257339.255581]  [<c0131a98>] autoremove_wake_function+0x0/0x2d
    Oct 15 12:22:39 sp5059b kernel: [2257339.255585]  [<c0132368>] set_process_cpu_timer+0x27/0xae
    Oct 15 12:22:39 sp5059b kernel: [2257339.255590]  [<c0125be2>] do_setitimer+0x2aa/0x31a
    Oct 15 12:22:39 sp5059b kernel: [2257339.255594]  [<c017e0a0>] fasync_helper+0x3c/0xb7
    Oct 15 12:22:39 sp5059b kernel: [2257339.255597]  [<c01bafe5>] security_file_permission+0xc/0xd
    Oct 15 12:22:39 sp5059b kernel: [2257339.255601]  [<c017454f>] do_sync_write+0x0/0x100
    Oct 15 12:22:39 sp5059b kernel: [2257339.255605]  [<c0174d80>] vfs_write+0x83/0x120
    Oct 15 12:22:39 sp5059b kernel: [2257339.255608]  [<c0175352>] sys_write+0x3c/0x63
    Oct 15 12:22:39 sp5059b kernel: [2257339.255612]  [<c01038d2>] syscall_call+0x7/0xb
    Oct 15 12:22:39 sp5059b kernel: [2257339.255618]  =======================

And some more errors:

    Oct 15 12:22:39 sp5059b kernel: [2257339.255618]  =======================
    Oct 15 13:52:55 sp5059b kernel: [2262820.703285] INFO: task kswapd0:173 blocked for more than 120 seconds.
    Oct 15 13:52:55 sp5059b kernel: [2262820.703310] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Oct 15 13:52:55 sp5059b kernel: [2262820.703342] kswapd0       D 9527f7ea     0   173      2
    Oct 15 13:52:55 sp5059b kernel: [2262820.703346]        f747c4e0 00000046 c036f160 9527f7ea 0007f529 f747c66c c1fdbfc0 00000001 
    Oct 15 13:52:55 sp5059b kernel: [2262820.703352]        00000000 00000001 0051b47e 00000000 00000000 00000000 00000246 c0131ba7 
    Oct 15 13:52:55 sp5059b kernel: [2262820.703357]        f77dc400 f7731dc4 f77dc450 00396fe5 f8899e28 f77dc414 00000000 f747c4e0 
    Oct 15 13:52:55 sp5059b kernel: [2262820.703362] Call Trace:
    Oct 15 13:52:55 sp5059b kernel: [2262820.703374]  [<c0131ba7>] prepare_to_wait+0x12/0x4d
    Oct 15 13:52:55 sp5059b kernel: [2262820.703383]  [<f8899e28>] log_wait_commit+0x8b/0xd1 [jbd]
    Oct 15 13:52:55 sp5059b kernel: [2262820.703393]  [<c0131a98>] autoremove_wake_function+0x0/0x2d
    Oct 15 13:52:55 sp5059b kernel: [2262820.703398]  [<f88975dd>] journal_try_to_free_buffers+0x123/0x13b [jbd]
    Oct 15 13:52:55 sp5059b kernel: [2262820.703407]  [<f89320f7>] ext3_releasepage+0x0/0x57 [ext3]
    Oct 15 13:52:55 sp5059b kernel: [2262820.703420]  [<c0156409>] try_to_release_page+0x33/0x45
    Oct 15 13:52:55 sp5059b kernel: [2262820.703424]  [<c015ee80>] shrink_page_list+0x3c7/0x4a8
    Oct 15 13:52:55 sp5059b kernel: [2262820.703431]  [<c015e349>] isolate_lru_pages+0x44/0x17f
    Oct 15 13:52:55 sp5059b kernel: [2262820.703436]  [<c015e349>] isolate_lru_pages+0x44/0x17f
    Oct 15 13:52:55 sp5059b kernel: [2262820.703441]  [<c015f04f>] shrink_inactive_list+0xee/0x2fd
    Oct 15 13:52:55 sp5059b kernel: [2262820.703450]  [<c015f30e>] shrink_zone+0xb0/0xcd
    Oct 15 13:52:55 sp5059b kernel: [2262820.703454]  [<c015fa64>] kswapd+0x27b/0x3ed
    Oct 15 13:52:55 sp5059b kernel: [2262820.703459]  [<c015e484>] isolate_pages_global+0x0/0x42
    Oct 15 13:52:55 sp5059b kernel: [2262820.703463]  [<c0131a98>] autoremove_wake_function+0x0/0x2d
    Oct 15 13:52:55 sp5059b kernel: [2262820.703468]  [<c015f7e9>] kswapd+0x0/0x3ed
    Oct 15 13:52:55 sp5059b kernel: [2262820.703471]  [<c01319d7>] kthread+0x38/0x5d
    Oct 15 13:52:55 sp5059b kernel: [2262820.703474]  [<c013199f>] kthread+0x0/0x5d
    Oct 15 13:52:55 sp5059b kernel: [2262820.703477]  [<c01044f7>] kernel_thread_helper+0x7/0x10
    Oct 15 13:52:55 sp5059b kernel: [2262820.703482]  =======================
    Oct 17 16:39:21 sp5059b kernel: [2448749.835890] INFO: task exim4:15225 blocked for more than 120 seconds.
    Oct 17 16:39:21 sp5059b kernel: [2448749.835915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Oct 17 16:39:21 sp5059b kernel: [2448749.835947] exim4         D 0f35bdb1     0 15225  15223
    Oct 17 16:39:21 sp5059b kernel: [2448749.835951]        c24032c0 00000082 c1fdbfc0 0f35bdb1 00089b8e c240344c c1fdbfc0 00000001 
    Oct 17 16:39:21 sp5059b kernel: [2448749.835956]        00000000 00000001 000013fa 00000000 00000000 00000000 00000246 c0131ba7 
    Oct 17 16:39:21 sp5059b kernel: [2448749.835962]        f77dc400 e6185f20 f77dc450 0039e41c f8899e28 f77dc414 00000000 c24032c0 
    Oct 17 16:39:21 sp5059b kernel: [2448749.835967] Call Trace:
    Oct 17 16:39:21 sp5059b kernel: [2448749.835978]  [<c0131ba7>] prepare_to_wait+0x12/0x4d
    Oct 17 16:39:21 sp5059b kernel: [2448749.835986]  [<f8899e28>] log_wait_commit+0x8b/0xd1 [jbd]
    Oct 17 16:39:21 sp5059b kernel: [2448749.835996]  [<c0131a98>] autoremove_wake_function+0x0/0x2d
    Oct 17 16:39:21 sp5059b kernel: [2448749.836001]  [<f8896451>] journal_stop+0x12f/0x151 [jbd]
    Oct 17 16:39:21 sp5059b kernel: [2448749.836009]  [<f892f0b7>] ext3_sync_file+0x57/0x9c [ext3]
    Oct 17 16:39:21 sp5059b kernel: [2448749.836022]  [<c015744c>] filemap_fdatawrite+0x12/0x16
    Oct 17 16:39:21 sp5059b kernel: [2448749.836027]  [<c018f5b5>] do_fsync+0x41/0x83
    Oct 17 16:39:21 sp5059b kernel: [2448749.836031]  [<c018f614>] __do_fsync+0x1d/0x2b
    Oct 17 16:39:21 sp5059b kernel: [2448749.836034]  [<c01038d2>] syscall_call+0x7/0xb
    Oct 17 16:39:21 sp5059b kernel: [2448749.836041]  =======================

Best Answer

It sounds to me like you may have had an enormous number of files in the directory. I've seen very similar symptoms to what you describe with very large directories.

There may have been some sort of issue (possibly a configuration issue with logrotate) where you ended up with millions of files. Once you get into the millions, most filesystems start having issues. Just doing an ls can take an enormous amount of time, and may even die if you wait long enough. Commands like rm will similarly have issues due to the sheer number of files. Similarly, things like logrotate will appear to hang because doing simple operations on the directory will take a really long time to complete.

One way to verify this is to do an ls -l on the directory above (/var/log/apache2/ in this case), and view the size of the directory entry (not the contents). For example (from a server I have, where there are a few directories with many files):

$ ls -lh
rwxrwxr-x 2 ironport ironport  11M Apr 21 19:37 oma01syslog01/
drwxrwxr-x 2 ironport ironport  12M Apr 21 19:37 oma01syslog02/
drwxrwxr-x 2 ironport ironport 5.8M Mar 17 12:30 sat01syslog01/
drwxrwxr-x 2 ironport ironport 4.0K Jun 29  2011 swn01syslog01/

$ for DIR in * ; do echo -n "$DIR:  " ; find $DIR -type f -print | wc -l ; done
oma01syslog01:  204332
oma01syslog02:  195500
sat01syslog01:  70960
swn01syslog01:  0

Notice the size of the directory entry as compared to the number of files in it.

Your best bet is to try to view the files with something like find. It doesn't try to pull the whole list before displaying them, and can allow you to query and operate on a directory without the issues you see in ls and rm. Some useful examples are below.

Number of files:

find ./directory/name/here -type f -print | wc -l

View a sample of files in the directory:

find ./directory/name/here -type f -print | head -20 | xargs ls -lh

Remove all files matching a pattern from the directory:

find ./directory/name/here -type f -iname '*.log' -print | xargs rm -rf

Note: I just noticed your comment, where you showed an ls from one directory up:

drwxr-xr-x 2 root root 126406656 2011-10-17 18:51 vhosts

This confirms my suspicions. 126406656 bytes is 120MB. That's a pretty huge directory entry (from my example above, I had an 11MB directory entry with 200k files), and suggests that you had millions of files in that directory.

Related Topic