Electronic – Detecting DMA overflow in arbitrary waveform generation

debuggingmicrocontrollerstm32stm32f10xtimer

I'm generating a complex series of pulses on an STM32F103, essentially as described in the ST app note AN4776 General-purpose timer cookbook, section 5.3. As a quick summary, that means I'm using the timer's DMA burst mode to transfer new values to the ARR, RCR and CCR1 after every update event of the timer. A more detailed description is provided in the end of the question, in case you're not familiar with the app note.

My problem is related to the DMA bandwidth: the pulses I generate can in principle be arbitrarily closely spaced, including just 1 clock cycle (the timer has a prescaler of 1). Of course in this case, the DMA cannot possibly make it in time to transfer the data for the next pulse, and there will be glitch in the output. Since my application can tolerate small infrequent timing errors, I preprocess my pulses such that the smallest pulse interval is some fixed minimum (I'm using 96 clock ticks at the moment) to give the DMA a chance. However, I'm still getting glitches that I think are due to the DMA, and increasing the minimum time even to very large numbers doesn't seem to be helping.

The key part of the previous sentence is "…I think…". So I'd like to find a way to know for sure whether the DMA has missed it's transfer or not, preferably something that I can leave in the code always running, so that I'll find even very infrequent glitches.

What I've though of/tried this far is:

  1. Looking at the DMA error registers. However, there doesn't seem to be a "transfer missed" flag or anything similar, which makes sense since most of the time it's not an error if a transfer is pending for a while. Also, even in my case it isn't necessarily an error if the transfer is pending for a while, so I guess this isn't going to help me.
  2. I'm running this on two different timers, TIM8 and TIM1, outputting pulses on two different pins. The sequences of pulses I'm preparing are always 0.5ms long in total (so that I can react to external events within 1.5ms), i.e. there's an idle pulse if necessary such that one of the timer updates happens at exactly the 0.5ms boundary. The TIM8 is the master, and I actually time my main loop by waiting for the DMA queue to be at the 0.5ms boundary, that is, waiting for a specific value of the DMA CNDTR register (I've checked by oscilloscope, toggling a pin in the main loop, that this works and is accurate). Now, when I've come to the right time according to TIM8, I know exactly where the TIM1 queue should be, so I can check its CNDTR. The reason I think that there are glitches is that this check fails some times (once every few minutes, so often on human time scales, very rarely in terms of numbers of pulses).
  3. What I'm working on now: I set up another timer as reference, and use its capture channel to get the value of the reference timer whenever TIM8's update signal is asserted. The captured value should always be exactly one of the pulse instants, and especially I can check the latest captured value in the main loop at the 0.5ms boundaries. Using another timer, I can do the same check on TIM1. The downside here is that this doesn't yet quite guarantee that the transfers have completed, since the registers in the main timer are updated in order ARR, RCR, CCR1. So if ARR has been updated but CCR1 was missed, the pulse instants themselves would be correct, even though the output pulse lengths wouldn't be.

Of course, the glitches I'm seeing could be due to a bug in the code that's generating the buffers the DMA is sending to the timers. But that's exactly why I'd like to know for sure whether or not the DMA is missing some transfers, so I'd know if I'm on the hunt for a bug in my code or not. The code itself does pass a reasonably comprehensive set of unit tests, so the bug would be a subtle one if it's there.

So, any ideas on checking if my problem is due to DMA misses or not?

Details of what I'm exactly doing:

I'm generating a series of pulses, each determined by the length of the "on" phase (pulsewidth) and the time (in clock ticks) until the next pulse. In terms of data structures

struct pulse {


     /*
     * These are an image of the timer registers.
     * We don't use repeats but it must be there
     * since the DMA transfers it anyway
     *
     * TODO: support other capture/compare channels than 1
     */


    uint32_t length; //ARR
    uint32_t repeats; //RCR, we don't use this
    uint32_t pulsewidth; //CCR1
};

//Check that pulse is of the correct type for the DMA stream
static_assert(std::is_pod<pulse>::value, "pulse must be POD");
static_assert(sizeof(pulse) == 12, "pulse must not be packed");

pulse pulseArray[MAX_PULSES];

Then, the burst mode of TIM8 (and correspondingly, TIM1) allows us to set up the following scheme:

  1. The DMA channel of the timer is programmed to move from pulseArray to the TIM8_DMAR "DMA address for full transfer" register, in circular mode (so I of course have to keep filling it with new data in a similarly circular fashion)
  2. the TIM8_DCR "DMA control register" is programmed with burst length 3, and base address TIM8_ARR (see datasheet RM0008 page 360-361 for a detailed description of these registers), and the DMA request for update is enabled via TIM8_DIER bit 8
  3. Now, on each update of the timer, due to the burst mode programmed above, the timer activates the DMA request 3 times transferring, in this order, the fields length, repeats, and pulsewidth, and the burst mode directs these to TIM8_ARR, TIM8_RCR and TIM8_CCR1 registers. Since we've enabled preload on both the timer itself (TIM8_CR1 bit 7) and the compare channel (TIM8_CCMR1 bit 3), the data are then transferred to the preload register, and become active at the next update (i.e. when the current pulse completes)

Figure 30 and 33 in the app note are very enlightening in understanding the above.

And now I can state the problem in a bit more detail: assume length is, for example, 1. Then the amount of clock cycles available for the DMA to transfer the next pulse (which is actually 2 positions ahead in the buffer due to the preload registers, but that's not important here) is one (assuming timer prescaler is 1, which it is in this case). Obviously it is impossible for the DMA to transfer 3 16-bit words in one clock cycle, and therefore the values of the previous cycle get repeated. Lets call that a "DMA miss", for want of a better term.

On the other hand, there must be some minimum length such that during any pulse longer than that, the DMA will have time to transfer all the data. Unfortunately this minimum length depends on the exact bus timings, other DMA traffic and their priorities, so it's really difficult to determine that length by pen and paper.

So I'd like to find a way to detect, with as much certainty as I can, that a "DMA miss" has not happened, so I could fine tune my minimum length, and at the same time be sure that some other glitches I'm seeing are not due to a "DMA miss".

Best Answer

You use DMA in circular mode; how do you determine the update time of the registers? As you use only 1 buffer, there is no such trivial timing that won't cause glitches. You only will get more close to be glitch-free by increasing precision of timing but never exactly glitch-free in probability terms.

There is an easy-to-use double buffering feature on stm32f4 devices. But on stm32f1 devices, there is elements of it to do it manually. You have the half-transfer and transfer-end interrupt triggers there on DMA peripheral. Design the circular buffer as two sets of registers, as with every interrupt, swap the buffer pointer that assumed being read by DMA and to be updated by CPU.