(1) I see that each of the running processes occupies a very small percentage of memory (%MEM no more than 0.2%, and most just 0.0%), but how the total memory is almost used as in the fourth line of output ("Mem: 130766620k total, 130161072k used, 605548k free, 919300k buffers")? The sum of used percentage of memory over all processes seems unlikely to achieve almost 100%, doesn't it?
To see how much memory you are currently using, run free -m
. It will provide output like:
total used free shared buffers cached
Mem: 2012 1923 88 0 91 515
-/+ buffers/cache: 1316 695
Swap: 3153 256 2896
The top row 'used' (1923) value will almost always nearly match the top row mem value (2012). Since Linux likes to use any spare memory to cache disk blocks (515).
The key used figure to look at is the buffers/cache row used value (1316). This is how much space your applications are currently using. For best performance, this number should be less than your total (2012) memory. To prevent out of memory errors, it needs to be less than the total memory (2012) and swap space (3153).
If you wish to quickly see how much memory is free look at the buffers/cache row free value (695). This is the total memory (2012)- the actual used (1316). (2012 - 1316 = 696, not 695, this will just be a rounding issue)
(2) how to understand the load average on the first line ("load average: 14.04, 14.02, 14.00")?
This article on load average uses a nice traffic analogy and is the best one I've found so far: Understanding Linux CPU Load - when should you be worried?. In your case, as people pointed out:
On multi-processor system, the load is relative to the number of processor cores available. The "100% utilization" mark is 1.00 on a single-core system, 2.00, on a dual-core, 4.00 on a quad-core, etc.
So, with a load average of 14.00 and 24 cores, your server is far from being overloaded.
The CPU can and will be used for other processes, provided there is at least one process that is ready to receive CPU time. There's the rub - you can have an I/O-bound system with every process waiting for I/O to complete, and as there is nothing waiting for CPU time, there is no reason to schedule (and utilize) CPU for anything other than the kernel's activities...hence the term, I/O wait.
Try running vmstat 1
and see if there are numbers greater than 0 in the "b" column (2nd column) on a regular basis. If so, you're probably I/O-bound. Seeing it occasionally isn't a big deal, seeing it all the time with numbers in the 2-3 range is tolerable but not desireable, and seeing more than 5+ means you're probably way too busy (although that depends on how much I/O your system can accomodate, so it can be more or less, depending). The "b" means "processes blocked", as in, "the number of processes scheduled to run, but were blocked, pending the completion of an I/O".
Follow-up:
There is a known bug with heavy I/O and the newer schedulers on the 2.6 series of kernels. Try changing your scheduler to see if it has an impact.
Best Answer
The
Process Queue Length
count from theSystem
performance counter object is:This value is available in WMI via
Win32_PerfFormattedData_PerfOS_System
.