I'm monitoring a mysql process in ubuntu 12.04 using mpstat
and pidstat
, simultaneously. While debugging its CPU usage, I noticed something unusual. While both tools agree that CPU load is occurring, they disagree about what kind of load it is. Here's an exemplar time slice:
$ pidstat -p 2763 1
01:36:36 PM PID %usr %system %guest %CPU CPU Command
01:36:36 PM 2763 0.00 0.00 0.00 0.00 0 mysqld
01:36:37 PM 2763 0.00 1.00 0.00 1.00 0 mysqld
01:36:38 PM 2763 0.00 5.00 0.00 5.00 0 mysqld
01:36:39 PM 2763 0.00 11.00 0.00 11.00 0 mysqld
01:36:40 PM 2763 0.00 12.00 0.00 12.00 0 mysqld
01:36:41 PM 2763 0.00 0.00 0.00 0.00 0 mysqld
01:36:42 PM 2763 0.00 0.00 0.00 0.00 0 mysqld
01:36:43 PM 2763 0.00 5.00 0.00 5.00 0 mysqld
01:36:44 PM 2763 0.00 170.00 0.00 170.00 0 mysqld
01:36:45 PM 2763 0.00 150.00 0.00 150.00 0 mysqld
01:36:46 PM 2763 0.00 4.00 0.00 4.00 0 mysqld
01:36:47 PM 2763 0.00 7.00 0.00 7.00 0 mysqld
01:36:48 PM 2763 0.00 7.00 0.00 7.00 0 mysqld
01:36:49 PM 2763 0.00 59.00 0.00 59.00 0 mysqld
01:36:50 PM 2763 0.00 85.00 0.00 85.00 0 mysqld
Note that pidstat
is reporting heavy system-level usage at various points in time. Here's mpstat
for roughly the same time period:
$ mpstat -P ALL 1
01:36:37 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
01:36:38 PM all 0.38 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.62
01:36:38 PM 0 2.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 97.98
01:36:38 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:38 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:38 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:38 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:38 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:38 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:38 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:38 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
01:36:39 PM all 0.88 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.12
01:36:39 PM 0 4.08 0.00 0.00 0.00 0.00 0.00 0.00 0.00 95.92
01:36:39 PM 1 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 98.00
01:36:39 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:39 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:39 PM 4 0.99 0.00 0.99 0.00 0.00 0.00 0.00 0.00 98.02
01:36:39 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:39 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:39 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:39 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
01:36:40 PM all 0.62 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.38
01:36:40 PM 0 5.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 94.95
01:36:40 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:40 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:40 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:40 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:40 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:40 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:40 PM 7 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.01
01:36:40 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
01:36:41 PM all 1.25 0.00 0.25 0.00 0.00 0.12 0.00 0.00 98.38
01:36:41 PM 0 10.00 0.00 1.00 0.00 0.00 1.00 0.00 0.00 88.00
01:36:41 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:41 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:41 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:41 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:41 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:41 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:41 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:41 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
01:36:42 PM all 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:42 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:42 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:42 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:42 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:42 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:42 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:42 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:42 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:42 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
01:36:43 PM all 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.88
01:36:43 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:43 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:43 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:43 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:43 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:43 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:43 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:43 PM 7 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.00
01:36:43 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
01:36:44 PM all 2.88 0.00 0.00 0.00 0.00 0.00 0.00 0.00 97.12
01:36:44 PM 0 22.77 0.00 0.99 0.00 0.00 0.00 0.00 0.00 76.24
01:36:44 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:44 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:44 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:44 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:44 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:44 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:44 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:44 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
01:36:45 PM all 23.72 0.00 0.12 0.00 0.00 0.12 0.00 0.00 76.03
01:36:45 PM 0 99.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
01:36:45 PM 1 89.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 11.00
01:36:45 PM 2 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.00
01:36:45 PM 3 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 99.01
01:36:45 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:45 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:45 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:45 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:45 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
01:36:46 PM all 13.89 0.00 0.00 0.00 0.00 0.00 0.13 0.00 85.98
01:36:46 PM 0 34.34 0.00 0.00 0.00 0.00 1.01 0.00 0.00 64.65
01:36:46 PM 1 69.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 31.00
01:36:46 PM 2 5.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 95.00
01:36:46 PM 3 3.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 97.00
01:36:46 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:46 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:46 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:46 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:46 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
01:36:47 PM all 0.50 0.00 0.00 0.00 0.00 0.13 0.00 0.00 99.37
01:36:47 PM 0 3.96 0.00 0.00 0.00 0.00 0.00 0.00 0.00 96.04
01:36:47 PM 1 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.00
01:36:47 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:47 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:47 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:47 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:47 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:47 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:47 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
01:36:48 PM all 0.63 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.37
01:36:48 PM 0 5.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 94.90
01:36:48 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:48 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:48 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:48 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:48 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:48 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:36:48 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
I'm not surprised to see them disagree about overall percentages, but what I can't figure out is why pidstat
and mpstat
disagree about the type of CPU load. mpstat
indicates the load is happening at the application level. pidstat
indicates it's happening at the kernel level. What's happening here? I assumed both were views into /proc
, but I'm not versed enough in their internals to know where they're getting different signals.
EDIT: Best I can tell this results from discrepancies between the /proc
calculation mechanism used by each program.
pidstat
is using /proc/<pid>/stat
: https://github.com/sysstat/sysstat/blob/master/pidstat.c#L373-L382
mpstat
is using /proc/uptime
and /proc/stat
: https://github.com/sysstat/sysstat/blob/master/rd_stats.c#L59-L163
Best Answer
So, I've got a cause, but it's not really a root cause:
mpstat
relies on/proc/stat
for its cpu metrics.pidstat
relies on/proc/<pid>/stat
. These files at times disagree about the number of kernel jiffies observed in a particular length of time, which I suppose makes sense as we're talking about such a increments of time resolution:versus...
Each difference is, of course, subject to derivative calculation depending on
mpstat
andpidstat
's source code, massaging into percentages, etc.If you dig further into
/proc
and sum/proc/<pid>/task/**/stat
, the result makes more sense. That number grows more slowly than/proc/stat
as a whole, and significantly more slowly than/proc/<pid>/stat
.It seems like if I'm going to take this any further I'd need to better understand how the kernel is writing these files, for my particular version. It's clear that
/proc/<pid>/stat
does not agree with/proc/<pid>/task/**/stat
.