Electronic – Error handling in embedded systems development

cembeddedfirmware

I am developing a Firmware and I am using 3 layers: Driver, Platform and Application. In order to handle the communication between these layers, I am using an approach where each function will return a success/failure flag:

typedef enum {
  FW_ERROR = 0,
  FW_OK    = 1
}FWStatusTypeDef;

Every function in my FW looks like this:

FWStatusTypeDef function1(void)
{
  /* Do some stuff */
  ..
  ..

  if(something_wrong_happens)
    return FW_ERROR;
  return FW_OK;
}


FWStatusTypeDef function2(void)
{
  /* Do some stuff */
  ..
  ..
  if(something_wrong_happens)
    return FW_ERROR;

  return FW_OK;
}

The approach I am using is good for debugging but when running the application, I want to identify which function is sending an error flag. In other words, I want to keep track of errors and display them in a screen

Example:

FWStatusTypeDef OpenTheDoor(void)
{
  if (!KeyIsAvailable())
  {
    return FW_ERROR;
  }
  Door->Open();

  if (!DoorIsOpen())
  {
     return FW_ERROR;
  }

  return FW_OK;
}

FWStatusTypeDef KeyIsAvailable(void)
{
  return GetKeyStatus();
}

FWStatusTypeDef DoorIsOpen(void)
{
  return GetDoorStatus();
}

In this example, when I call OpenTheDoor() and it returns FW_ERROR. I won't be able to know if the problem is that the key is missing or there was another problem while trying to open the door (for example a cow is blocking it because it hates me).

I've been thinking and I came up with this solution: An Error code handler.

typedef enum {
  ERROR_1 = 0,
  ERROR_2 = 1,
  ERROR_k = k, // A cow is blocking the door
  ERROR_n = n  // The key is missing
}ErrorIdTypeDef;

static ErrorIdTypeDef ErrorCode;

// Getter
ErrorIdTypeDef GetErrorCode(void)
{
  return ErrorCode;
}

// Setter
void SetErrorCode(ErrorIdTypeDef code)
{
  ErrorCode = code;
}

Each function will use the error handler exposed Setter.

FWStatusTypeDef OpenTheDoor(void)
{
  if (!KeyIsAvailable())
  {
    SetErrorCode(ERROR_n);
    return FW_ERROR;
  }
  Door->Open();

  if (!DoorIsOpen())
  {
     SetErrorCode(ERROR_k);
     return FW_ERROR;
  }

  return FW_OK;
}

FWStatusTypeDef KeyIsAvailable(void)
{
  return GetKeyStatus();
}

FWStatusTypeDef DoorIsOpen(void)
{
  return GetDoorStatus();
}

In the application layer, I read the error code using the exposed getter API if an error occurs:

if (!OpenTheDoor())
{
  ErrorIdTypeDef theError;
  theError = GetErrorCode();
  switch(theError)
  {
    case ERROR_k:
      DisplayOnTheScreen("A cow is blocking the door.");
      break;

    case ERROR_n:
      DisplayOnTheScreen("You don't have a key.");
      break;

    ...
    ...

    default:
      break;
  }
}
else
{
  DisplayOnTheScreen("Door is open");
}

Now, my questions are not about the syntax, they're more about "good practices". Does this approach have any limitation ? Is there another known practice to get the same result ? In other words, Am I complicating things here ?

Best Answer

Using enums for error codes is common practice. The whole purpose of using enum over for example bool is to get more information.

The pattern you suggest with a common "last error" handler is not recommended though. This has been tried historically many times, never with good results. Must infamously the Windows API GetLastError() function.

The problem with "last error" handlers are several: they are not thread/interrupt safe and they only remember the last error. If you set FW_ERROR_KEY from a function, from the time that happens until you print the error, another error could have happened. You then print the wrong reason and get the wrong diagnostic.

The common way to handle errors is instead rather something like this:

for(;;)
{
  kick_watchdog();

  result = state_machine[state]();

  if(result != OK)
  {
    error_handler(result);
  }
}

That's what the main loop looks like in many bare metal MCU projects. All errors are passed down from drivers up to the application layer, and error handling is centralized.

The application layer might then make the call to ignore certain errors, or handle them, or replace their error codes with another code. Etc.