How to calculate CPU % based on raw CPU ticks in SNMP

central-processing-unitmonitoringnagiossnmp

According to http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html#scalar_notcurrent ssCpuUser, ssCpuSystem, ssCpuIdle, etc are deprecated in favor of the raw variants (ssCpuRawUser, etc).

The former values (which don't cover things like nice, wait, kernel, interrupt, etc) returned a percentage value:

The percentage of CPU time spent processing
user-level code, calculated over the last minute.

This object has been deprecated in favour of
'ssCpuRawUser(50)', which can be used to calculate
the same metric, but over any desired time period.

The raw values return the "raw" number of ticks the CPU spent:

The number of 'ticks' (typically 1/100s) spent
processing user-level code.

On a multi-processor system, the 'ssCpuRaw*'
counters are cumulative over all CPUs, so their
sum will typically be N*100 (for N processors).

My question is: how do you turn the number of ticks into percentage?

That is, how do you know how many ticks per second (it's typically — which implies not always — 1/100s, which either means 1 every 100 seconds or that a tick represents 1/100th of a second).

I imagine you also need to know how many CPUs there are or you need to fetch all the CPU values to add them all together. I can't seem to find a MIB that gives you an integer value for # of CPUs which makes the former route awkward. The latter route seems unreliable because some of the numbers overlap (sometimes). For example, ssCpuRawWait has the following warning:

This object will not be implemented on hosts where
the underlying operating system does not measure
this particular CPU metric. This time may also be
included within the 'ssCpuRawSystem(52)' counter.

Some help would be appreciated. Everywhere seems to just say that % is deprecated because it can be derived, but I haven't found anywhere that shows the official standard way to perform this derivation.

The second component is that these "ticks" seem to be cumulative instead of over some time period. How do I sample values over some time period?

The ultimate information I want is: % of user, system, idle, nice (and ideally steal, though there doesn't seem to be a standard MIB for this) "currently" (over the last 1-60s would probably be sufficient, with a preference for smaller time spans).

Best Answer

Since these are absolute counters, you would have to regularly retrieve these metrics and then do the calculation yourself. So, if you want the number over the next minute, you would have to get the numbers, wait a minute, and get the numbers again. SNMP would not update those numbers too often so you may not be able to get these every second anyway.

Once you have the raw user, nice, system, idle, interrupts counters you can get the total number of ticks by summing these up. Even the MIB description says that adding them up is expected.

$ snmptranslate -Td .1.3.6.1.4.1.2021.11.52
UCD-SNMP-MIB::ssCpuRawSystem
...
    This object may sometimes be implemented as the
    combination of the 'ssCpuRawWait(54)' and
    'ssCpuRawKernel(55)' counters, so care must be
    taken when summing the overall raw counters."

Then, regardless of how long it has been since you took the measurements, the total number of ticks over that period is total1 - total0. And the idle percentage would be (idle1-idle0)/(total1-total0).

You are asking "how do you know how many ticks per second it is typically" but as you can see, you don't need to know that.