Centos – loop0 command eating 100% of the CPU

apache-2.2centos

Today I noticed that my server was becoming very slow.
I checked it thru top command, and I got:

top - 21:49:32 up 25 days,  9:13,  1 user,  load average: 1238.23, 825.34, 502.3
Tasks: 1815 total, 145 running, 1666 sleeping,   0 stopped,   4 zombie
Cpu(s):  1.3%us, 98.0%sy,  0.0%ni,  0.0%id,  0.4%wa,  0.0%hi,  0.4%si,  0.0%st
Mem:  12290984k total, 12252988k used,    37996k free,    30756k buffers
Swap:  1052248k total,   428116k used,   624132k free,   981528k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 3129 root       5 -20     0    0    0 R 77.8  0.0  34:10.25 loop0              
 2281 nobody    18   0  163m  11m 3128 R 55.6  0.1   0:02.93 httpd              
 2021 nobody    19   0  162m  11m 3552 R 44.9  0.1   0:03.07 httpd              
  561 nobody    18   0  163m  11m 3172 R 44.4  0.1   0:02.03 httpd              
 2085 nobody    17   0  163m  11m 3176 R 41.4  0.1   0:03.22 httpd              
 1116 nobody    18   0  162m  11m 3168 R 37.2  0.1   0:02.38 httpd              
31809 nobody    18   0  163m  12m 3500 R 36.2  0.1   0:02.10 httpd              
 1906 nobody    17   0  161m 9364 1936 R 35.7  0.1   0:13.15 httpd              
31979 nobody    17   0  162m  11m 3404 R 30.7  0.1   0:04.41 httpd              
32610 nobody    18   0  161m 9688 2344 R 29.9  0.1   0:11.07 httpd              
 2326 nobody    17   0  162m  11m 3428 R 28.7  0.1   0:02.18 httpd              
  565 root      20  -5     0    0    0 R 27.4  0.0   4:29.02 kswapd0            
 2183 nobody    19   0  162m  11m 3100 R 26.4  0.1   0:02.55 httpd              
 1998 nobody    17   0  162m  10m 2484 R 24.7  0.1   0:10.76 httpd              
28515 nobody    16   0  169m  16m 5416 R 23.4  0.1   0:02.75 httpd              
 2056 nobody    19   0  166m  14m 5776 R 22.2  0.1   0:02.95 httpd              
32379 nobody    16   0  164m  12m 4376 R 20.7  0.1   0:01.52 httpd 

Id like to know what is wrong. I think it's related to the /tmp directory

root@server [~]# mount
/dev/sda2 on / type ext3 (rw,usrquota)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/usr/tmpDSK on /tmp type ext3 (rw,noexec,nosuid,loop=/dev/loop0)
/tmp on /var/tmp type none (rw,noexec,nosuid,bind)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)


root@server [~]# losetup -a
/dev/loop0: [0802]:103095300 (/usr/tmpDSK)

Best Answer

This line from the mount output is relevant:

/usr/tmpDSK on /tmp type ext3 (rw,noexec,nosuid,loop=/dev/loop0)

What this shows is that your /tmp file system is using a loopback mount, and that is the reason the loop0 process is showing up. That's an unusual configuration, which is probably not the ideal configuration. It does mean that everything accessing /tmp will have to be handled by the loop0 process if the data is not already in cache.

The output from top shows an excessively high load average of 1238.23 but you (only) have 145 processes in running state. If those two numbers are stable it would indicate that you have more than 1000 processes blocked waiting for I/O. How many of those blocked processes are waiting for loop0 to do some work cannot be determined from the shown output alone.

Given the large amount of used memory and the small numbers for free, buffers, and cached I would conclude that that system is under significant memory pressure. It is a surprise that it hasn't used all of the swap space yet.

I would add some more RAM to that server. And I would stop using loopback for /tmp. If the loopback device was set up because / was running out of disk space and /usr had space to share, there is a better way to use some of the space in /usr for /tmp. You can create a /usr/local/tmp directory and bind mount that to /tmp. A bind mount does not have the overhead of needing a loopback device.