Linux – Why might ‘mpstat’ and ‘pidstat’ disagree about CPU load levels

cpu-usagelinuxperformance

I'm monitoring a mysql process in ubuntu 12.04 using mpstat and pidstat, simultaneously. While debugging its CPU usage, I noticed something unusual. While both tools agree that CPU load is occurring, they disagree about what kind of load it is. Here's an exemplar time slice:

$ pidstat -p 2763 1
01:36:36 PM       PID    %usr %system  %guest    %CPU   CPU  Command
01:36:36 PM      2763    0.00    0.00    0.00    0.00     0  mysqld
01:36:37 PM      2763    0.00    1.00    0.00    1.00     0  mysqld
01:36:38 PM      2763    0.00    5.00    0.00    5.00     0  mysqld
01:36:39 PM      2763    0.00   11.00    0.00   11.00     0  mysqld
01:36:40 PM      2763    0.00   12.00    0.00   12.00     0  mysqld
01:36:41 PM      2763    0.00    0.00    0.00    0.00     0  mysqld
01:36:42 PM      2763    0.00    0.00    0.00    0.00     0  mysqld
01:36:43 PM      2763    0.00    5.00    0.00    5.00     0  mysqld
01:36:44 PM      2763    0.00  170.00    0.00  170.00     0  mysqld
01:36:45 PM      2763    0.00  150.00    0.00  150.00     0  mysqld
01:36:46 PM      2763    0.00    4.00    0.00    4.00     0  mysqld
01:36:47 PM      2763    0.00    7.00    0.00    7.00     0  mysqld
01:36:48 PM      2763    0.00    7.00    0.00    7.00     0  mysqld
01:36:49 PM      2763    0.00   59.00    0.00   59.00     0  mysqld
01:36:50 PM      2763    0.00   85.00    0.00   85.00     0  mysqld

Note that pidstat is reporting heavy system-level usage at various points in time. Here's mpstat for roughly the same time period:

$ mpstat -P ALL 1
01:36:37 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
01:36:38 PM  all    0.38    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.62
01:36:38 PM    0    2.02    0.00    0.00    0.00    0.00    0.00    0.00    0.00   97.98
01:36:38 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:38 PM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:38 PM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:38 PM    4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:38 PM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:38 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:38 PM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:36:38 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
01:36:39 PM  all    0.88    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.12
01:36:39 PM    0    4.08    0.00    0.00    0.00    0.00    0.00    0.00    0.00   95.92
01:36:39 PM    1    2.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.00
01:36:39 PM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:39 PM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:39 PM    4    0.99    0.00    0.99    0.00    0.00    0.00    0.00    0.00   98.02
01:36:39 PM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:39 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:39 PM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:36:39 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
01:36:40 PM  all    0.62    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.38
01:36:40 PM    0    5.05    0.00    0.00    0.00    0.00    0.00    0.00    0.00   94.95
01:36:40 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:40 PM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:40 PM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:40 PM    4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:40 PM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:40 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:40 PM    7    0.99    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.01

01:36:40 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
01:36:41 PM  all    1.25    0.00    0.25    0.00    0.00    0.12    0.00    0.00   98.38
01:36:41 PM    0   10.00    0.00    1.00    0.00    0.00    1.00    0.00    0.00   88.00
01:36:41 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:41 PM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:41 PM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:41 PM    4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:41 PM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:41 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:41 PM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:36:41 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
01:36:42 PM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:42 PM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:42 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:42 PM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:42 PM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:42 PM    4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:42 PM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:42 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:42 PM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:36:42 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
01:36:43 PM  all    0.12    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.88
01:36:43 PM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:43 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:43 PM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:43 PM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:43 PM    4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:43 PM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:43 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:43 PM    7    1.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.00

01:36:43 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
01:36:44 PM  all    2.88    0.00    0.00    0.00    0.00    0.00    0.00    0.00   97.12
01:36:44 PM    0   22.77    0.00    0.99    0.00    0.00    0.00    0.00    0.00   76.24
01:36:44 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:44 PM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:44 PM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:44 PM    4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:44 PM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:44 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:44 PM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:36:44 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
01:36:45 PM  all   23.72    0.00    0.12    0.00    0.00    0.12    0.00    0.00   76.03
01:36:45 PM    0   99.00    0.00    0.00    0.00    0.00    1.00    0.00    0.00    0.00
01:36:45 PM    1   89.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   11.00
01:36:45 PM    2    1.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.00
01:36:45 PM    3    0.00    0.00    0.99    0.00    0.00    0.00    0.00    0.00   99.01
01:36:45 PM    4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:45 PM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:45 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:45 PM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:36:45 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
01:36:46 PM  all   13.89    0.00    0.00    0.00    0.00    0.00    0.13    0.00   85.98
01:36:46 PM    0   34.34    0.00    0.00    0.00    0.00    1.01    0.00    0.00   64.65
01:36:46 PM    1   69.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   31.00
01:36:46 PM    2    5.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   95.00
01:36:46 PM    3    3.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   97.00
01:36:46 PM    4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:46 PM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:46 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:46 PM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:36:46 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
01:36:47 PM  all    0.50    0.00    0.00    0.00    0.00    0.13    0.00    0.00   99.37
01:36:47 PM    0    3.96    0.00    0.00    0.00    0.00    0.00    0.00    0.00   96.04
01:36:47 PM    1    1.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.00
01:36:47 PM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:47 PM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:47 PM    4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:47 PM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:47 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:47 PM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:36:47 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
01:36:48 PM  all    0.63    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.37
01:36:48 PM    0    5.10    0.00    0.00    0.00    0.00    0.00    0.00    0.00   94.90
01:36:48 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:48 PM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:48 PM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:48 PM    4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:48 PM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:48 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:36:48 PM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

I'm not surprised to see them disagree about overall percentages, but what I can't figure out is why pidstat and mpstat disagree about the type of CPU load. mpstat indicates the load is happening at the application level. pidstat indicates it's happening at the kernel level. What's happening here? I assumed both were views into /proc, but I'm not versed enough in their internals to know where they're getting different signals.

EDIT: Best I can tell this results from discrepancies between the /proc calculation mechanism used by each program.

pidstat is using /proc/<pid>/stat: https://github.com/sysstat/sysstat/blob/master/pidstat.c#L373-L382

mpstat is using /proc/uptime and /proc/stat: https://github.com/sysstat/sysstat/blob/master/rd_stats.c#L59-L163

Best Answer

So, I've got a cause, but it's not really a root cause:

mpstat relies on /proc/stat for its cpu metrics. pidstat relies on /proc/<pid>/stat. These files at times disagree about the number of kernel jiffies observed in a particular length of time, which I suppose makes sense as we're talking about such a increments of time resolution:

17:06:41 /proc/stat 158257197
17:06:46 /proc/stat 158257230
17:06:51 /proc/stat 158257236
17:06:56 /proc/stat 158257279
# t3 - t0 = 82 jiffies

versus...

17:06:41 /proc/2763/stat 208862696
17:06:46 /proc/2763/stat 208862971
17:06:51 /proc/2763/stat 208863114
17:06:56 /proc/2763/stat 208863143
# t3 - t0 = 447 jiffies

Each difference is, of course, subject to derivative calculation depending on mpstat and pidstat's source code, massaging into percentages, etc.

If you dig further into /proc and sum /proc/<pid>/task/**/stat, the result makes more sense. That number grows more slowly than /proc/stat as a whole, and significantly more slowly than /proc/<pid>/stat.

It seems like if I'm going to take this any further I'd need to better understand how the kernel is writing these files, for my particular version. It's clear that /proc/<pid>/stat does not agree with /proc/<pid>/task/**/stat.