Linux Process Accounting – Difference between ‘cp’ and ‘cpu’ fields

linuxmonitoringprocess-accounting

I've been looking at using process accounting for keeping track of various users and sites running scripts that are problematic in our environment instead of trying to do something like scraping top at regular intervals.

The one that isn't particularly clear is which fields really denotes the cpu seconds/minutes used. The manual pages I have read say the 'cpu' column is for seconds burned, however there is alo the 'cp' column displayed with -m – and they can show different totals. For example:

When I use the -m flag, I get

 $sa -m | grep username
 username 14944      65.53re      29.90cp     5308k

When I use the -u flag and total up the column for 'cpu', I get the following:

 sa -u |grep username|awk 'BEGIN{TOTAL=0}{TOTAL=TOTAL+$2}END{print TOTAL}'
 1032.86

Can anybody help me understand the difference between the 'cp' and 'cpu' columns in these two different modes?

Thanks!

Best Answer

Let me use an example to help explain what your results above show:

First: I created a bash script that I ran as the user patrickr which was meant to put enough load on the system to be noticeable.

#!/bin/bash
#this file is named loop_script.sh
for i in {1..5000}
do
   echo "Welcome $i times"
done

Second: I uninstalled and then reinstalled acct so that my files in /var/log/acct would be fresh. Create a copy of the /var/log/acct/pacct file so that in the future you can easily truncate the file with a properly formatted file (you can't just delete and recreate the file - sa will stop working if you do that). Note that this file is a log of all commands on the system and as far as I can tell there is no way to pull parts on the log based on time periods.

Third: I then ran this script twice as patrickr

patrickr@hostname:~$ bash loop_script.sh

I'll give you the results and then I'll explain them:

Ran as root (or any user other that patrickr) After first loop as patrickr:

**sa -m**
                            24       0.09re       0.03cp         0avio       894k
root                        22       0.07re       0.02cp         0avio       853k
patrickr                    2       0.02re       0.01cp         0avio      1336k

**sa -u |grep patrickr**

patrickr   0.38 cpu     1336k mem      0 io bash

After second loop as patrickr:

**sa -m**
                            30       0.09re       0.03cp         0avio       850k
root                        27       0.07re       0.02cp         0avio       814k
patrickr                    3       0.02re       0.01cp         0avio      1178k

**sa -u |grep patrickr**

patrickr   0.38 cpu     1336k mem      0 io bash            
patrickr   0.35 cpu     1336k mem      0 io bash            
patrickr   0.00 cpu      863k mem      0 io ls  (I happened to also run ls at patrickr)

**sa -u**

The results returned 106 results for a total of 2.86cpu that averaged to 0.03cp

Here's what you're seeing:

sa -m is showing averages for the all the activity for this server overtime. This file grows larger over time as more commands run.

sa -u | grep patrickr is showing the sum of the system and user time in cpu minutes for specific commands.

Running: sa -u |grep patrickr|awk 'BEGIN{TOTAL=0}{TOTAL=TOTAL+$2}END{print TOTAL}'

Will give you a combined total for user patrick but the sa -m command is actually giving you averages. Take a look at the memory values is you need a second example. They're averaged too.

If I add the three results listed above for patrickr, .35 + .37 + .0 and divide by 106 and round to the to the nearest hundredth I'll get 0.01cp.

The result of 0.01cp is the average load of user patrickr on the system in comparison to all load on the system from the time that the acct application was installed (ie since the file /var/log/acct/pacct started keeping track).

Output Fields
cpu   -  sum of system and user time in cpu minutes
re    -  actual time in minutes
k     -  cpu-time averaged core usage, in 1k units
k*sec -  cpu storage integral (kilo-core seconds)
u     -  user cpu time in cpu minutes
s     -  system time in cpu minutes

A good resource that will help you is at beginlinux.com (original link found here).

Related Solutions

Linux – Functional Differences Between .profile, .bash_profile, and .bashrc

.bash_profile and .bashrc are specific to bash, whereas .profile is read by many shells in the absence of their own shell-specific config files. (.profile was used by the original Bourne shell.) .bash_profile or .profile is read by login shells, along with .bashrc; subshells read only .bashrc. (Between job control and modern windowing systems, .bashrc by itself doesn't get used much. If you use screen or tmux, screens/windows usually run subshells instead of login shells.)

The idea behind this was that one-time setup was done by .profile (or shell-specific version thereof), and per-shell stuff by .bashrc. For example, you generally only want to load environment variables once per session instead of getting them whacked any time you launch a subshell within a session, whereas you always want your aliases (which aren't propagated automatically like environment variables are).

Other notable shell config files:

/etc/bash_profile (fallback /etc/profile) is read before the user's .profile for system-wide configuration, and likewise /etc/bashrc in subshells (no fallback for this one). Many systems including Ubuntu also use an /etc/profile.d directory containing shell scriptlets, which are . (source)-ed from /etc/profile; the fragments here are per-shell, with *.sh applying to all Bourne/POSIX compatible shells and other extensions applying to that particular shell.

Linux CPU Usage – Monitoring CPU Usage and Process Execution History

There are a couple of possible ways you can do this. Note that its entirely possible its many processes in a runaway scenario causing this, not just one.

The first way is to setup pidstat to run in the background and produce data.

pidstat -u 600 >/var/log/pidstats.log & disown $!

This will give you a quite detailed outlook of the running of the system at ten minute intervals. I would suggest this be your first port of call since it produces the most valuable/reliable data to work with.

There is a problem with this, primarily if the box goes into a runaway cpu loop and produces huge load -- your not guaranteed that your actual process will execute in a timely manner during load (if at all) so you could actually miss the output!

The second way to look for this is to enable process accounting. Possibly more of a long term option.

accton on

This will enable process accounting (if not already added). If it was not running before this will need time to run.

Having been ran, for say 24 hours - you can then run such a command (which will produce output like this)

# sa --percentages --separate-times
     108  100.00%       7.84re  100.00%       0.00u  100.00%       0.00s  100.00%         0avio     19803k
       2    1.85%       0.00re    0.05%       0.00u   75.00%       0.00s    0.00%         0avio     29328k   troff
       2    1.85%       0.37re    4.73%       0.00u   25.00%       0.00s   44.44%         0avio     29632k   man
       7    6.48%       0.00re    0.01%       0.00u    0.00%       0.00s   44.44%         0avio     28400k   ps
       4    3.70%       0.00re    0.02%       0.00u    0.00%       0.00s   11.11%         0avio      9753k   ***other*
      26   24.07%       0.08re    1.01%       0.00u    0.00%       0.00s    0.00%         0avio      1130k   sa
      14   12.96%       0.00re    0.01%       0.00u    0.00%       0.00s    0.00%         0avio     28544k   ksmtuned*
      14   12.96%       0.00re    0.01%       0.00u    0.00%       0.00s    0.00%         0avio     28096k   awk
      14   12.96%       0.00re    0.01%       0.00u    0.00%       0.00s    0.00%         0avio     29623k   man*
       7    6.48%       7.00re   89.26%       0.00u    0.00%       0.00s

The columns are ordered as such:

Number of calls
Percentage of calls
Amount of real time spent on all the processes of this type.
Percentage.
User CPU time
Percentage
System CPU time.
Average IO calls.
Percentage
Command name

What you'll be looking for is the process types that generate the most User/System CPU time.

This breaks down the data as the total amount of CPU time (the top row) and then how that CPU time has been split up. Process accounting only accounts properly when its on when processes spawn, so its probably best to restart the system after enabling it to ensure all services are being accounted for.

This, by no means actually gives you a definite idea what process it might be that is the cause of this problem, but might give you good feel. As it could be a 24 hour snapshot theres a possibility of skewed results so bear that in mind. It also should always log since its a kernel feature and unlike pidstat will always produce output even during heavy load.

The last option available also uses process accounting so you can turn it on as above, but then use the program "lastcomm" to produce some statistics of processes executed around the time of the problem along with cpu statistics for each process.

lastcomm | grep "May  8 22:[01234]"
kworker/1:0       F    root     __         0.00 secs Tue May  8 22:20
sleep                  root     __         0.00 secs Tue May  8 22:49
sa                     root     pts/0      0.00 secs Tue May  8 22:49
sa                     root     pts/0      0.00 secs Tue May  8 22:49
sa                   X root     pts/0      0.00 secs Tue May  8 22:49
ksmtuned          F    root     __         0.00 secs Tue May  8 22:49
awk                    root     __         0.00 secs Tue May  8 22:49

This might give you some hints too as to what might be causing the problem.

Best Answer

Related Solutions

Linux – Functional Differences Between .profile, .bash_profile, and .bashrc

Linux CPU Usage – Monitoring CPU Usage and Process Execution History

Related Topic