Electronic – Inconsistent SPI communication between STM32F4 (Slave) and Raspberry Pi (Master)

hal-libraryraspberry pispistm32f4

I've been having a very specific issues with large SPI transaction and after few days of debugging and trying different approaches I haven't progressed anywhere. To briefly explain the system:

Slave: STM32F405 chip that is soldered to a custom Raspberry Pi PCB shield. It communicates with an external sensor via interrupts. The sensor is actually an array of small sensors configured from Master and has one pin outputting analogue voltage and other digital pin to signal that the voltage is correct and can be converted to digital value. I use the digital signal as external interrupt to trigger ADC conversion inside the STM. The converted value is stored into an internal buffer and this is repeated until one whole frame (values from each sensor) is stored inside the chip. There is 12800 values in one frame, each being 12-bit. I created two buffers in memory and when one gets filled a signal goes high to show Raspberry Pi it is time to begin SPI transaction (Raspberry Pi can't be used as SPI slave). Based on the specification, each SPI transaction is 25600 bytes long.
Essential parts of the code using MXCube and HAL:

volatile uint16_t buf[2][12800];
//volatile uint16_t buf2[12800];
volatile int        pix_counter=0;
volatile uint8_t    buf_f=0;
volatile uint8_t *buf_pointer;
__IO uint16_t       adc_value=0;
/* USER CODE END PV */

/* Private function prototypes -----------------------------------------------*/
void SystemClock_Config(void);

int main(void)
{

  /* MCU Configuration--------------------------------------------------------*/

  /* Reset of all peripherals, Initializes the Flash interface and the Systick. */
  HAL_Init();

  /* Configure the system clock */
  SystemClock_Config();

  /* Initialize all configured peripherals */
  MX_GPIO_Init();
  MX_ADC1_Init();
  MX_DAC_Init();
  MX_SPI2_Init();

//Fill buffer with some dummy values at the start to test SPI   
for (int i=0; i<12800; i++)
{
    buf[0][i] = (uint16_t)i;
    }
    buf_f = 0;
    pix_counter = 0;
    buf_pointer = (uint8_t *)&buf[0];

  while (1)
  {
        HAL_SPI_Transmit(&hspi2, (uint8_t*)buf_pointer, sizeof(buf[0]), 100); 

  }
  /* USER CODE END 3 */
}

/* USER CODE BEGIN 4 */
//External ISR
void HAL_GPIO_EXTI_Callback(uint16_t GPIO_Pin)
{
    //Callback for the whole bank of GPIOs, test for GPIO 10
  if(GPIO_Pin == GPIO_PIN_10)
  {
        //If ADC finished conversion, read value
     if (__HAL_ADC_GET_FLAG(&hadc1, ADC_FLAG_EOC))
        {
            adc_value = HAL_ADC_GetValue(&hadc1);

                        buf[buf_f][pix_counter] = adc_value;
                        pix_counter++;
                        //If buffer filled, reset counter, start putting values into other buffer and signal that it is time to read the values.
                        if (pix_counter == 12800){
                            pix_counter = 0;
                            buf_pointer = (uint8_t *)&buf[buf_f];
                            buf_f ^= 1;
                            HAL_GPIO_WritePin(SPI_begin_f_GPIO_Port, SPI_begin_f_Pin, GPIO_PIN_SET);
                        }
//Stop signalling that buffer is ready, RPi should be able to catch it by now.
                        if (pix_counter == 255){
                            HAL_GPIO_WritePin(SPI_begin_f_GPIO_Port, SPI_begin_f_Pin, GPIO_PIN_RESET);
                        }
                    //Reset interrupt flag
                    __HAL_ADC_CLEAR_FLAG(&hadc1, ADC_FLAG_EOC);
                    }
         HAL_ADC_Start(&hadc1);
    }
    //Interrupt coming from the Master to reset counters if they stopped in the middle.
    else if (GPIO_Pin == Reset_counters_Pin)
    {
        pix_counter = 0;
        buf_f = 0;
        HAL_GPIO_WritePin(SPI_begin_f_GPIO_Port, SPI_begin_f_Pin, GPIO_PIN_RESET);
        __HAL_ADC_CLEAR_FLAG(&hadc1, ADC_FLAG_EOC);
        for (int i = 0; i < 12800; i++){
            buf[0][i] = (uint16_t)i;
        }
        buf_pointer = (uint8_t *)&buf[0];
    }
}

Master: Raspberry Pi uses the pigpio library to initialize and open the SPI channel. It is able to capture the frames by and receive all the required data but here is where the problem gets interesting. When initiating the transsmition without any interrupts running, the STM should give out 25600 bytes of incrementing values: {0x0000, 0x0001, 0x0002 .... 0x31FE, 0x31FF} When I was trying out the speed limits of the SPI, I found out that when I go above 10MHz, Raspberry Pi is no longer able to handle it, and repeats some of the values. The problem is that even the behavior at lower frequencies is not completely predictable. The received array always starts correctly: {0x0000, 0x0001, 0x0002 ...} but sometimes get reset at random points, becomes something like: {0x0000, 0x0001, 0x0002 ... 0x10FA, 0x10FB, 0x0000, 0x0001, 0x0002 ...}. When the reset happens, the values keep correctly incrementing, it just seems like it starts over, taking values from the starting memory position. The SPI always finishes with OK response, confirming 25600 bytes have been transferred.

I found out that there is a relationship between the master SPI clock speed and the timeout value in HAL_SPI_Transmit. Putting it to HAL_DELAY_MAX usually gives one correct SPI transmission but then the next one seems to be broken (only 0's in the whole frame). Then the third transaction is correct again, and fourth is zeros again, etc (repeating). The lower the timeout value and the lower the master clock speed, more resets happen. They even seem to happen quite regularly at low speed, implying the timeout is directly responsible.

Example: Timeout set to 100; SPI baud rate set to 100k; reset happens after values: (0x037E, 0x03E7, 0x03E7, 0x03E8, etc.)

I've been checking the memory map inside the STM as well as the values of the variables and pointers and it appears that everything is correct. Except sometimes the received frame is correct and sometimes it isn't. Any idea what could be causing this issue? Should I go away from HAL and try to implement the slave mechanism more low level?

I guess I am missing some background with regards to what the timeout actually stands for. I could try to also use DMA, but I'm not entirely sure if it is necessary. The STM code is very simple (two interupt routines and an SPI slave channel) so I thought the extra overhead by using HAL implementation will be acceptable.

Best Answer

If your transmit function has a time out, and it is asynchronous with respect to the master read timing, then it's just a question of how often it will occur during a transfer. It will occur.

I suggest you setup an interrupt on the CS signal, and prepare the transfer then, just before the master begins. This will have the added benefit of resetting the slave state before each transaction, so you don't, e.g. have a transaction starting in the middle of a previous transaction didn't complete.