(1) I see that each of the running processes occupies a very small percentage of memory (%MEM no more than 0.2%, and most just 0.0%), but how the total memory is almost used as in the fourth line of output ("Mem: 130766620k total, 130161072k used, 605548k free, 919300k buffers")? The sum of used percentage of memory over all processes seems unlikely to achieve almost 100%, doesn't it?
To see how much memory you are currently using, run free -m
. It will provide output like:
total used free shared buffers cached
Mem: 2012 1923 88 0 91 515
-/+ buffers/cache: 1316 695
Swap: 3153 256 2896
The top row 'used' (1923) value will almost always nearly match the top row mem value (2012). Since Linux likes to use any spare memory to cache disk blocks (515).
The key used figure to look at is the buffers/cache row used value (1316). This is how much space your applications are currently using. For best performance, this number should be less than your total (2012) memory. To prevent out of memory errors, it needs to be less than the total memory (2012) and swap space (3153).
If you wish to quickly see how much memory is free look at the buffers/cache row free value (695). This is the total memory (2012)- the actual used (1316). (2012 - 1316 = 696, not 695, this will just be a rounding issue)
(2) how to understand the load average on the first line ("load average: 14.04, 14.02, 14.00")?
This article on load average uses a nice traffic analogy and is the best one I've found so far: Understanding Linux CPU Load - when should you be worried?. In your case, as people pointed out:
On multi-processor system, the load is relative to the number of processor cores available. The "100% utilization" mark is 1.00 on a single-core system, 2.00, on a dual-core, 4.00 on a quad-core, etc.
So, with a load average of 14.00 and 24 cores, your server is far from being overloaded.
You can try using ltrace
with the -c
trace (very similar to strace
but for library calls instead of system calls). This won't be complete as actually profiling the code and might not be the CPU time breakdown you are looking for, but it might just be the quick syadmin level tool you need.
kbrandt@kbrandt-acer:~$ ltrace -c xcalc
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
66.83 0.222693 4453 50 XtCreateManagedWidget
28.52 0.095048 95048 1 XtAppInitialize
0.85 0.002837 2837 1 XtRealizeWidget
0.83 0.002764 2764 1 XSetWMProtocols
0.77 0.002581 2581 1 XtGetApplicationResources
0.42 0.001383 53 26 XtWindow
0.41 0.001371 54 25 XtDisplay
...
------ ----------- ----------- --------- --------------------
100.00 0.333219 168 total
strace
also with the -c
switch will give you similar output but will show the system calls (the calls the libraries are using -- so sort of a level deeper).
The caveat with both of these break downs is these are wall clock time spent on each call and it doesn't show if this was active or idle time.
If you have the code and want to go all out you want code profiling. Stack Overflows "What can I use to profile my C++ code in Linux?" should get you started. I have used Valgrind with C code and liked it.
Best Answer
The top program has columns of information that can be added and removed while running by pressing 'f'. There is a column that will display the numeric user id 'd', which may be more useful in your case.