Electronic – Changing flash latency from 0 to 1 even with enabled prefetch slows execution on STM32

armflashmicrocontrollerstm32stm32f3

I have put all clock configuration to its default state, so an internal oscillator is running at 8 mhz. I have a delay loop using inline assembly as follows:

// Delay a certain number of cycles using glorious inline assembly.
void delay(uint32_t time) {
    asm volatile(
        "mov r4, #3                \n" // Divide time by three since the loop is
        "udiv %[time], %[time], r4 \n" // 3x too slow.
        "loop:                     \n"
        "subs %[time], %[time], #1 \n" // 1 cycle
        "bne loop                  \n" // 1 cycle if not take, 2 if taken.
        : [time] "+l" (time)           // Put rw input variable time in r0..r7.
                                       // Make it rw and as output so we don't clobber
        :                              // Time is both input and output.
        : "r4", "cc"                   // We are clobbering r4 and condition code flags.
    );
}

and a GPIO routine as follows:

// Blink the led with a period of 1 second.
while (1) {
    // Set LED pin.
    GPIOA_BSRR = 1 << LED_PIN;
    delay(second_cycles / 2);

    // Reset LED pin
    GPIOA_BSRR = 1 << (LED_PIN + 16);
    delay(second_cycles / 2);
}

When running without wait states everything is fine and performing as expected. But when I change the wait states for flash from 0 to 1, my loop takes 833 milliseconds instead of 500 milliseconds, or a 66% loss in performance roughly.

When I use GDB to debug, I can see the FLASH_ACR register has the contents of 0b0011 0000 which signify that the prefetch buffer is both enabled and has a status of enabled, with 0 wait states for flash. Plus, the prefetch buffer should be enabled on reset as per the datasheet. When I write to the register by or'ing it with 0b001 I get the expected result of 0b0011 0001 back after reading from it again. This is done doing the following:

// Change the flash wait states to 1.
volatile uint32_t foo1 = FLASH_ACR;
FLASH_ACR = FLASH_ACR | (0b001 << 0);
volatile uint32_t foo2 = FLASH_ACR;

Interestingly enough, enabling or disabling the prefetch buffer with a 1 wait state makes no difference in my loop, which doesn't seem to make sense.

And here is the relevant section of the datasheetFlash ACR

I am using an NUCLEO-F303RE board, which uses an STM32F103RE.

Best Answer

The cache buffer in STM32F303RE is only 8 bytes (64 bits) therefore if your cycle code is longer than 8 bytes it will have no effect because the buffer is rewritten each cycle again and again. Here, the I-cache could help you but as i see there is no I-cache in this MCU.

It is preferable to use a timer sourced from an always constant clock, like 32.768 kHz or other oscillation, to count delays. Your MCU implements many timers and the RTC, try to use one.

void delay(unsigned timertickstowait)
{
  unsigned time0 = get_current_timer_ticks_count();
  while( (unsigned)(get_current_timer_ticks_count() - time0) < timertickstowait )
    { /* do nothing */ }
  return;
}

If the timer you use is sourced from a clock (frequency) independent from the MCU clock, the delay routine will be practically independent in its behavior from MCU frequency and/or Flash access parameters.