Vps – High Load Average, low CPU and Memory Usage and minimum wait time for IO Operation (5%)

centos5vps

I have a VPS with the OS centos 5.7, and it is behaving very weirdly. My VPS is located on a 2-core machine.

For a 2-core machine, the load average I can see is very high, as evident when I use the top command to investigate:

 - 04:04:40 up 1 day, 22:43,  1 user,  load average: 6.23, 5.19, 4.72
Tasks:  59 total,   1 running,  58 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.4%us,  3.4%sy,  0.0%ni, 85.4%id,  5.8%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1376256k total,   755908k used,   620348k free,        0k buffers
Swap:        0k total,        0k used,        0k free,        0k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    1 root      15   0  2172  664  572 S  0.0  0.0   0:02.59 init
 1135 root      18  -4  2276  556  344 S  0.0  0.0   0:00.00 udevd
 1231 root      19   0 32716  564  460 S  0.0  0.0   0:00.00 brcm_iscsiuio
 1542 root      16   0  1828  580  488 S  0.0  0.0   0:03.24 syslogd
 1599 named     23   0 50596 3984 2024 S  0.0  0.3   0:01.26 named
 1615 root      18   0  7228 1044  644 S  0.0  0.1   0:00.00 sshd
 1626 root      15   0  2848  844  676 S  0.0  0.1   0:00.00 xinetd
 1638 root      18   0  3728 1316 1144 S  0.0  0.1   0:00.00 mysqld_safe
 1662 mysql     15   0  252m  99m 4876 S  0.0  7.4   9:21.01 mysqld
 1738 postgres  15   0 20348 3412 2900 S  0.0  0.2   0:00.26 postmaster
 1740 postgres  15   0 10128  904  388 S  0.0  0.1   0:01.42 postmaster
 1742 postgres  15   0 20348  984  468 S  0.0  0.1   0:05.20 postmaster
 1743 postgres  18   0 11128  812  292 S  0.0  0.1   0:00.13 postmaster
 1744 postgres  15   0 10308 1060  440 S  0.0  0.1   0:00.00 postmaster
 1757 mailnull  15   0  9524 2328 1836 S  0.0  0.2   0:00.99 exim
 1786 root      18   0  2172  720  552 S  0.0  0.1   0:02.58 dovecot
 1787 root      18   0  2648 1040  832 S  0.0  0.1   0:02.04 dovecot-auth

As you can see, the load is 6 ( for a 2-core machine), but when all the top processes added together, the memory and CPU consumption is minimum!

I thought this was an IO wait issue, so I used iostat -cx 30 to check:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.43    0.02    3.36    5.80    0.00   85.39

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.79    0.00    0.33    2.09    0.00   93.79

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.61    0.00    0.30    5.67    0.00   90.42

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.91    0.00    0.22    1.04    0.00   96.83


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.47    0.00    0.28    0.75    0.00   95.49


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.93    0.00    0.44    2.62    0.00   93.01

As you can see, the %iowait is only 5%, it means that my processes only use 5% of the time waiting for IO operation, so it shows that the disk is not busy, there is no possibility that the high load average is caused by the processes are waiting for the disk, right?

Finally, to further confirm my point, I type in vmstat:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 751928      0      0    0    0   120    99    0  105  5  3 85  6  0

As you can see, the process running is minimum, the b column is 0, indicating that the number of processes on UNINTERRUPTIBLE_SLEEP is 0. Further more, the bi column (blocks read from a block device) is only 120, not so high right? The si column (memory read from swap/disk) is 0. Finally, under the cpu header, the wa column shows that the CPU spends only 6% of time waiting for IO to complete.

All these rule out the possibility of IO operation as the bottleneck.

So, the conclusion is, the load average is very high and it degrades the performance of my website, however, this high load average is not caused by any of the following:

High CPU or memory usage by my processes
IO operation.

What can cause the high load average?

Best Answer

CPU load is the average number of processes ready to run. A process waiting for I/O by definition doesn't add into it.

The numbers are certainly weird, with a load average of 6 I'd expect much higher CPU utilization than 5 to 6%. But then again, the load is decreasing, perhaps there was a CPU spike a while back? Anything special about the workload?

Install sysstat, learn how to use it (it isn't simple, mind you) and milk it for insight...

Related Solutions

Centos – Why does ‘top’ indicate low memory usage, whilst ‘free’ indicates high memory usage

Top, that is the figure in the %MEM column, is counting the amount of RSS memory (Resident Segment Size, basically pages physically in memory that have real data on them) as a percentage of total physical memory in your machine or VPS.

On the other hand, free is counting just that, the amount of physical memory pages that have no data on them, and have not been assigned to buffers, cache or the kernel. In a Unix like operating system, the OS tries hard to keep that number as low as possible by using free pages for disk cache. The only time you'll likely a high value of free memory is just after your machine boots, or if you quit a program that was consuming a large amount of physical memory itself.

Is this memory usage normal ? The short answer is yes. It is typical for Unix programs to allocate (that is ask the OS for) significantly more memory than they would use. If you look at the VSS column, for the processes listed the total is over 463mb. That is because

A lot of the memory accounted against each process will be physically mapped to the same library, say glibc
The OS generally overcommits memory to the application, on the basis that most applications never come to collect on what they have asked for.

Figure out process memory usage is more an art than a science IMHO, see the discussions on http://lwn.net. My advice is to keep a close eye on iostat -xm and ensure that your machine is not swapping heavily.

Nginx + php-fpm – Each php-fpm process 70-100% cpu when running

A couple of things to consider (apologies in advance if you have already considered these): First of all, make sure to optimize your nginx config and invoke php-fpm only when absolutely necessary. The last thing you want to do is let php handle things like static HTML pages (which it will happily do).

Secondly, since you're using php-fpm, I suggest to be more aggressive with how long php-fpm's children are allowed to live. You need to find the sweet spot between shortly lived threads/children and stability. The php-fpm defaults are way too generous for any production system, IMHO. The longer a worker is allowed to serve requests, the more unstable it will get. There's also a higher risk of memory leaks, and if this framework you refer to has bugs like infinite loops, which may be causing you grief with CPU load, this shouldn't hurt.

I'd reduce the number for pm.max_requests for your production pools. I think the default is 200. I'd start from 50 and see where that takes you.

Failing/complementary to that, you could also try these global options (AFAIK they are all disabled by default):

emergency_restart_threshold 3
emergency_restart_interval 1m
process_control_timeout 5s

What does this mean? If 3 PHP-FPM child processes exit with SIGSEGV or SIGBUS (i.e. crash) within 1 minute then PHP-FPM is supposed to restart automatically. The child processes waits 5s for a reaction on signals from master.

Here's a nice overview of all the config options I mentioned here, as well as others: http://myjeeva.com/php-fpm-configuration-101.html

Hope these tips help you! Remember to tweak and observe, unfortunately there doesn't seem to be a rule of thumb for all this, as you observed, there are too many variables that affect PHP's behaviour and stability.

Finally, the CPU limiting facility you inquired about is documented here, but I'd only resort to it if you exhaust every other option. If you do choose this path, I'd definitely watch out for possible interactions between PHP-FPM tweaks and your limits.conf configuration. At that point etckeeper may be a lifesaver! :)

Good luck!

Rouben

Best Answer

Related Solutions

Centos – Why does ‘top’ indicate low memory usage, whilst ‘free’ indicates high memory usage

Nginx + php-fpm – Each php-fpm process 70-100% cpu when running

Related Topic