Electronic – CPU Utilization Methods

cpudigital-logicembeddedinterruptsmicrocontroller

To fulfill customer expentations and fill out customer reports about the device, is there also one section about CPU Utilization. Because I have never done such task before I have overview some "Google Search" articles. A lot of the articles are in direct connection with Linux programming, a lot of them speak generally about CPU Utilization (just theoretical) and I didnt found no article about the method how can be this done .. okay there are some: embedded.com.

I am interested how YOU have done such task timing job before? I am intereseted in method and also whit which tool was done? With some direct measuring on the osciloscope (or logic analyser) or capturing data from osciloscope and post processing them? Which time frame to take for CPU Utilization to calculate – most "busy moment" when all interrupts are present, because in this case CPU utilization is much bigger then maybe 1 milisecond or 1 microsecond later, when only the background loop is executing?

Maybe for reference how I have made my first CPU Utilization approach (I don't know if is the right approach):
Every interrupt when start executing have dedicated PIN which goes high when interrupt begin and gets low when interrupt ends. There is also same propagation delays involed. I export this signals over the osciloscope into one file and post-process with octave. There is still an issue which timeframe to take.

In case of any question please write in the comment section

Best Answer

CPU utilization is really only a crude measurement of the overall resiliency of a real-time system. Therefore, the answer to your question is that it is generally a long-term average value.

The real criterion is whether all of the software tasks meet their completion deadlines. Note that this includes both tasks triggered by interrupts and tasks triggered by other kinds of events. When CPU utilization begins to approach 100%, then the completion time of lower-priority tasks tends to become arbitrarily large.

Using GPIO pins to indicate the run time of individual tasks is one good way to check whether those deadlines are ever exceeded.

Another approach is to instrument the code itself. If you have access to a free-running counter (a spare hardware counter/timer module, perhaps), then you can take a snapshot of its value at the beginning of each task, and then at the end of the task, take another snapshot and compute the difference. If this ever exceeds the required value for that task, indicate an error.

A slightly different question would be to compute the expected CPU utilization of a system, before it is implemented.

In this case, you consider each task individually, coming up with estimates of how long it runs when triggered and how often it is triggered. The run time divided by the trigger period gives the CPU utilization for that task by itself.

If you add up all of the individual utilization values and get a value that approaches or exceeds 100%, then you need to think about ways to redistribute the work — faster CPU, more CPUs, dedicated hardware for some tasks, etc.

Related Solutions

Electronic – Measuring 0 – 1MHz ( 0.25Hz resolution) Squarewave using an MCU

If possible I'd suggest selecting a microcontroller that supports a counter operation using the timer inputs; rather than manually incrementing a counter inside an ISR (which at high frequencies quickly ends up saturating the microcontroller activity) you allow the hardware to handle the counting. At this point your code simply becomes a matter of waiting for your periodic interrupt then calculating the frequency.

To extend the range and make the frequency counter more generalised (removing the need for multiple ranges at the expense of a little more work for the MCU) you could use the following technique.

Select a periodic interrupt rate that allows for measurement accuracy at the highest input frequency; this should take into account your counter size (you need to select the timer period such that the timer counter will not overflow at the maximum input frequency). For this example I'll assume that the input counter value can be read from the variable "timer_input_ctr".

Include a variable for counting periodic interrupts (should be initialised to 0 at startup); for this example I'll refer to this variable as "isr_count". The interrupt period is contained in the constant "isr_period".

Your periodic interrupt should be implemented as (C pseudo-code):

void timer_isr()
{
  isr_count++;
  if (timer_input_ctr > 0)
  {
    frequency = timer_input_ctr / (isr_count * isr_period).
    timer_input_ctr = 0;
    isr_count = 0;
  }
}

Obviously this rough example relies on some floating point math that may not be compatible for low-end microcontrollers, there are techniques to overcome this but they are outside of the scope of this answer.

Electronic – FreeRTOS queues and IPC confusion

Comms stacks :(

Comms stacks in plain C :((

This is a summary of how I do it, though it's surely not the only way:

The app starts by by creating a 'generalPool' array of buffer structs, (BS:), of a fixed size. No more buffers are ever allocated and no buffers are ever freed during the run. The BS has space for data, data len, next/prev index bytes and a 'command' enum that describes what the buffer is, (and other stuff, but that clouds the issue). Indexes to this array are used for all inter-thread and driver comms, (I use byte-size indexes, rather than pointers, because there are less than 256 BS and I have RAM constraints). The next/prev bytes are initialized to form a double-linked list, and the calls to get/put an index are protected by a mutex.

Inter-thread comms are performed by getting a BS index from the generalPool, loading it up as required, setting the enum and then pushing the index onto a producer-consumer queue. The thread at the other end dequeues the index and, typically, switches on the enum to handle the BS message. Once handled, the consumer thread can repool the BS or queue it on somewhere else for further handling, (logger, say).

Because the BS has those next/prev bytes, the producer-consumer queue class does not need any storage space of its own - it has first and last bytes and so can link together the BS in a similar manner to the pool.

OK, now drivers:

I have interrupt-nesting disabled so that only one interrupt can run at a time. This enables me to make a BS index 'DriverQueue'. The DriverQueue has actual storage space for the index bytes - it does not use the next/prev links. This allows BS indexes to be safely added at one end, and removed at the other, by any one interrupt and one thread.

I have one 'CommsPool' DriverQueue. This is pre-filled on startup with some BS extracted from the generalPool. These BS are used for received data.

I have one 'commsTx' DriverQueue for each tx interrupt. Outgoing data is queued on them.

I have one 'commsRx' DriverQueue for all rx interrupts. Incoming data is queued on it.

One 'commsThread' handles the higher-level comms by initializing and operating a state machine, similar to your idea. When idle, it waits on a 'CommsEvent' semaphore.

The rx interrupts get BS from the CommsPool, load them up with data from the hardware, set the command enum to 'RxX', (X is the comms channel/interrupt ID number), push the BS index onto the common commsRx queue and signal CommsEvent.

The tx interrupts get BS from their own, private commsTx, load the data into the hardware, set the command enum to 'TxUsed', push the BS index onto the common commsRx queue and signal CommsEvent.

The commsThread is responsible for managing all the I/O. It has a 'commsRq' input queue for comms request BS from other threads. This is not, however a blocking queue - just thread-safe. It is not blocking because the commsThread has to handle the commsEvent signals from the interrupt-handlers as well.

Any thread that wants to communicate stuff loads up a BS with appropriate data and command, queues it to commsRq and signals CommsEvent, so waking the commsThread.

The commsthread does not know why it has been woken, so it polls the commsRx queue first to see if there is a BS in it. If there is, it handles it - if an 'RxX', it processes it through its state-engine code/data, if a 'TxUsed', it checks the CommsPool first, to see if it needs 'topping up', and pushes it there if there is need, else it pushes it back onto the generalPool for re-use elsewhere.

Once the commsThread has handled the driver queues appropriately, it polls the commsRq queue to see if there are any new comms requests from other threads. If there are, it dequeues and handles the request thorough it's state-machine code/data.

After that, the commsThread checks again to see if any CommsPool 'topping up' is required and, if the CommsPool is not full, tops it off with more BS from the generalPool.

The commsThread then loops back to wait on the semaphore again. The semaphore ensures that the commsThread runs exactly as many times as are requried to handle all input from other threads and the interrupt-handlers, no more, no less. If the thread ever wakes up and finds nothing to do, it's an error.

That's how I do it, anyway:) It provides good throughput and efficient use of RAM. Inter-thread producer-consumer queues need no internal storage. Only one thread, (and so only one RAM-consuming stack:), is required for all interrupt-management and Tx/Rx data handling. No mallocs/frees required after initialization. There is no busy-waiting or any need for periodic checking of any flags. No copying of the data is required, (except in/out of hardware - unavoidable). Timeout actions can be handled by either a timed wait on the semaphore, (preferable, if your OS supports it), or by the periodic'injection' of a 'TimeTick' BS on the inputQueue from some other thread. Returned BS can easily be 'diverted' to, say, a logger or terminal, for debug display before returning them to the generalPool.

However you do this, you should consider moving to C++. C just gets messy for anything other than simple straight-line code. C++ allows, for instance the BS to be implemented as class instances with methods for streaming in data and for 'auto-extending' a BS by getting and linking another BS if one BS gets full, so generating a 'compound' data message.

I've left some stuff out. For example, perhaps you already know the misery of tx interrupts - after the tx has been idle, they often have to be 'primed' by having the first bytes loaded into a FIFO to get the TX interrupt to start again :(

Also hint: my UART debug terminal prompt looks like 'A:96>'. The number, (96 here), is the current count of BS in the general pool. If this number starts dropping, I know I have a leak:)

Best Answer

Related Solutions

Electronic – Measuring 0 – 1MHz ( 0.25Hz resolution) Squarewave using an MCU

Electronic – FreeRTOS queues and IPC confusion

Related Topic