CPU 100% idle but still showing load average

central-processing-unitload-average

I have a Blade Server with CentOS 6.4.

On idle state it shows a constant load average of more than 1. However I prepared another machine having the same hardware and CentOS version and its load average is staying around 0 when it is idle.

The output of top is as follows:

top - 10:23:04 up 156 days, 18:15,  1 user,  load average: 1.08, 1.35, 1.31
Tasks: 534 total,   1 running, 533 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  65959040k total, 10021484k used, 55937556k free,   167092k buffers
Swap: 32767992k total,    13884k used, 32754108k free,  7084024k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20951 root      20   0 15396 1608  952 R  0.3  0.0   0:01.52 top
    1 root      20   0 19352  684  472 S  0.0  0.0   0:01.64 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.03 kthreadd
    3 root      RT   0     0    0    0 S  0.0  0.0   0:15.31 migration/0
    4 root      20   0     0    0    0 S  0.0  0.0   0:12.32 ksoftirqd/0
    5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    6 root      RT   0     0    0    0 S  0.0  0.0   0:17.45 watchdog/0
    7 root      RT   0     0    0    0 S  0.0  0.0   0:16.26 migration/1
    8 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/1
    9 root      20   0     0    0    0 S  0.0  0.0   0:18.51 ksoftirqd/1

Which process is causing the system load average to be > 1 while being totally idle?

Best Answer

Load average doesn't mean what you think it means. It's not about instant CPU usage, but rather how many processes are waiting to run. Usually that's because of lots of things wanting CPU, but not always. A common culprit is a process waiting for IO - disk or network.

Try running ps -e v and looking for process state flags.

state    The state is given by a sequence of characters, for example, "RWNA". The      first character indicates the run state of the process:
D    Marks a process in disk (or other short term, uninterruptible) wait.
I    Marks a process that is idle (sleeping for longer than about 20 seconds).  
L    Marks a process that is waiting to acquire a lock.
R    Marks a runnable process.
S    Marks a process that is sleeping for less than about 20 seconds.
T    Marks a stopped process.
W    Marks an idle interrupt thread.
Z    Marks a dead process (a "zombie").

This is from the ps manpage, so you an find more detail there - R and D processes are probably of particular interest.

Your top output contains:

Tasks: 534 total,   1 running, 533 sleeping,   0 stopped,   0 zombie

That 1 running process is the cause of your load average. Find it, and figure out what it's up to. (Edit: As mentioned in comments - that running process is probably top. So ignore that)