Vps – High Load Average, low CPU and Memory Usage and minimum wait time for IO Operation (5%)

centos5vps

I have a VPS with the OS centos 5.7, and it is behaving very weirdly. My VPS is located on a 2-core machine.

For a 2-core machine, the load average I can see is very high, as evident when I use the top command to investigate:

 - 04:04:40 up 1 day, 22:43,  1 user,  load average: 6.23, 5.19, 4.72
Tasks:  59 total,   1 running,  58 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.4%us,  3.4%sy,  0.0%ni, 85.4%id,  5.8%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1376256k total,   755908k used,   620348k free,        0k buffers
Swap:        0k total,        0k used,        0k free,        0k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    1 root      15   0  2172  664  572 S  0.0  0.0   0:02.59 init
 1135 root      18  -4  2276  556  344 S  0.0  0.0   0:00.00 udevd
 1231 root      19   0 32716  564  460 S  0.0  0.0   0:00.00 brcm_iscsiuio
 1542 root      16   0  1828  580  488 S  0.0  0.0   0:03.24 syslogd
 1599 named     23   0 50596 3984 2024 S  0.0  0.3   0:01.26 named
 1615 root      18   0  7228 1044  644 S  0.0  0.1   0:00.00 sshd
 1626 root      15   0  2848  844  676 S  0.0  0.1   0:00.00 xinetd
 1638 root      18   0  3728 1316 1144 S  0.0  0.1   0:00.00 mysqld_safe
 1662 mysql     15   0  252m  99m 4876 S  0.0  7.4   9:21.01 mysqld
 1738 postgres  15   0 20348 3412 2900 S  0.0  0.2   0:00.26 postmaster
 1740 postgres  15   0 10128  904  388 S  0.0  0.1   0:01.42 postmaster
 1742 postgres  15   0 20348  984  468 S  0.0  0.1   0:05.20 postmaster
 1743 postgres  18   0 11128  812  292 S  0.0  0.1   0:00.13 postmaster
 1744 postgres  15   0 10308 1060  440 S  0.0  0.1   0:00.00 postmaster
 1757 mailnull  15   0  9524 2328 1836 S  0.0  0.2   0:00.99 exim
 1786 root      18   0  2172  720  552 S  0.0  0.1   0:02.58 dovecot
 1787 root      18   0  2648 1040  832 S  0.0  0.1   0:02.04 dovecot-auth

As you can see, the load is 6 ( for a 2-core machine), but when all the top processes added together, the memory and CPU consumption is minimum!

I thought this was an IO wait issue, so I used iostat -cx 30 to check:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.43    0.02    3.36    5.80    0.00   85.39

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.79    0.00    0.33    2.09    0.00   93.79

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.61    0.00    0.30    5.67    0.00   90.42

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.91    0.00    0.22    1.04    0.00   96.83


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.47    0.00    0.28    0.75    0.00   95.49


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.93    0.00    0.44    2.62    0.00   93.01

As you can see, the %iowait is only 5%, it means that my processes only use 5% of the time waiting for IO operation, so it shows that the disk is not busy, there is no possibility that the high load average is caused by the processes are waiting for the disk, right?

Finally, to further confirm my point, I type in vmstat:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 751928      0      0    0    0   120    99    0  105  5  3 85  6  0

As you can see, the process running is minimum, the b column is 0, indicating that the number of processes on UNINTERRUPTIBLE_SLEEP is 0. Further more, the bi column (blocks read from a block device) is only 120, not so high right? The si column (memory read from swap/disk) is 0. Finally, under the cpu header, the wa column shows that the CPU spends only 6% of time waiting for IO to complete.

All these rule out the possibility of IO operation as the bottleneck.

So, the conclusion is, the load average is very high and it degrades the performance of my website, however, this high load average is not caused by any of the following:

  1. High CPU or memory usage by my processes
  2. IO operation.

What can cause the high load average?

Best Answer

CPU load is the average number of processes ready to run. A process waiting for I/O by definition doesn't add into it.

The numbers are certainly weird, with a load average of 6 I'd expect much higher CPU utilization than 5 to 6%. But then again, the load is decreasing, perhaps there was a CPU spike a while back? Anything special about the workload?

Install sysstat, learn how to use it (it isn't simple, mind you) and milk it for insight...

Related Topic