Error Handling Strategies in Multithreaded Environments – Architecture and Concurrency

Architectureconcurrencyerror handlinglibrariesmultithreading

TL;DR What error generating and handling strategies do you use in Multithreaded code intended for use by others and why do you use them? If applicable, state what programming paradigm it's useful for. I'm more interested in imperative, concurrent environments but any in general will be useful.

I'm writing a little concurrency library that's currently a pet/C++11 learning project but may be used internally by my work later. In terms of the domain it's more in the realm of DSP and media streaming but since this will be used in a game engine I need fairly strong error handling.

My big block at the moment is not getting my head around the parallel code and data structures but how to do error handling and reporting. My main experience in large systems is games but I'm usually using libraries, not designing them. I'm just looking for different strategies and how they might be used in different situations as this is a rather big gap in my knowledge.

My biggest area of concern is employing a strategy so that if the program can recover, it should. If it does recover, there should be a way to notify a user as to what's happened through some kind of mechanism. Already I have some some data structures that although they can recover, memory may be leaked if a destructor fails for example.

Some Approaches:

Handle exceptions for which you can safely recover from inside the library, but let fatal exceptions propagate to users of the library to indicate that object is now in an undefined state. It's my preferred approach in single threaded environments but this approach won't communicate bad state to other threads.
When an exception occurs that would put a data structure in an unrecoverable state, tear down the data structure and set a flag to block any more operations over that data structure then raise a general exception to the end user. This is hard for lock-free algorithms.
Forward an error state through parallel computations. Works well for Kahn process networks and other high level concurrency models. No so helpful if a primitive supporting the high level model has failed.
Terminate the thread/task that caused the exception. Works well for thread local data/computation but not much of a solution for shared data.

Just as a note, I know that a good library will probably use a mix of more than just what's listed above. I just don't have the experience to know what strategy is good where for any sufficiently large system.

Best Answer

My two cents.

First, most async models I've seen in libraries tend to make me frustrated. Everybody seems to have their own slightly different brand of async, and many of those interfaces are not good. As such, I tend to like libraries that keep everything synchronous. Note that callbacks can still be good. But keep all the logic on one thread; the application programmer often wants to think about threading apart from the task the library is trying to perform.

Second, a small library devoted to asynchronous code can be a very good thing - as long as that is its sole focus. A pattern that I've seen and liked in C# is to chain together actions on different threads, but write it in almost a single threaded manner with a fluent interface. (The new keyword await is somewhat along the same lines.) One common place where this comes up is in dispatches onto the UI thread. Then provide a way to handle exceptions at the end, almost like a catch block. So for example maybe something like this:

...
int expensiveResult=-1;
YourThreadLibrary
  .Background(()=>expensiveResult=DoLongRunningTaskToCreate())
  .UI(()=>UpdateUI(expensiveResult))
  .Exception(ex=>LogIt(ex));

Exceptions are the way to go in C# and you have GC, so your situation may be different. But the pattern may still make some sense.

I know this might seem too simplistic but these are the tools that I've seen be general enough to work across many problems. The nice thing is that a fluent interface like this is definitely open to extension if written properly so you can add your .ParallelFailOnAny(params Action[]) etc.

Related Solutions

Design – The Modern Way to Perform Error Handling

First of all, I would disagree with this statement:

Favour exceptions over error codes

This is not always the case: for example, take a look at Objective-C (with the Foundation framework). There the NSError is the preferred way to handle errors, despite the existence of what a Java developer would call true exceptions: @try, @catch, @throw, NSException class, etc.

However it is true that many interfaces leak their abstractions with the exceptions thrown. It is my belief that this is not the fault of the "exception"-style of error propagating/handling. In general I believe the best advice about error handling is this:

Deal with the error/exception at the lowest possible level, period

I think if one sticks to that rule of thumb, the amount of "leakage" from abstractions can be very limited and contained.

On whether exceptions thrown by a method should be part of its declaration, I believe they should: they are part of the contract defined by this interface: This method does A, or fails with B or C.

For example, if a class is an XML Parser, a part of its design should be to indicate that the XML file provided is just plain wrong. In Java, you normally do so by declaring the exceptions you expect to encounter and adding them to the throws part of the declaration of the method. On the other hand, if one of the parsing algorithms failed, there's no reason to pass that exception above unhandled.

It all boils down to one thing: Good interface design. If you design your interface well enough, no amount of exceptions should haunt you. Otherwise, it's not just exceptions that would bother you.

Also, I think the creators of Java had very strong security reasons to include exceptions to a method declaration/definition.

One last thing: Some languages, Eiffel for example, have other mechanisms for error handling and simply do not include throwing capabilities. There, an 'exception' of sort is automatically raised when a postcondition for a routine is not satisfied.

C++ – Designing exception classes

I think your colleague was right: you are designing your exception cases based on how simple it is to implement within the hierarchy, not based on the exception-handling needs of the client code.

With one exception type and an enumeration for the error condition (your solution), if the client code needs to handle single error cases (for example, my_errc::error_x) they must write code like this:

try {
    your_library.exception_thowing_function();
} catch(const my_error& err) {
    switch(err.code()) { // this could also be an if
    case my_errc::error_x:
        // handle error here
        break;
    default:
        throw; // we are not interested in other errors
    }
}

With multiple exception types (having a common base for the entire hierarchy), you can write:

try {
    your_library.exception_thowing_function();
} catch(const my_error_x& err) {
    // handle error here
}

where exception classes look like this:

// base class for all exceptions in your library
class my_error: public std::runtime_error { ... };

// error x: corresponding to my_errc::error_x condition in your code
class my_error_x: public my_error { ... };

When writing a library, the focus should be on it's ease of use, not (necessarily) ease of the internal implementation.

You should only compromize the ease of use (how client code will look like) when the effort of doing it right in the library, is prohibitive.

Best Answer

Related Solutions

Design – The Modern Way to Perform Error Handling

C++ – Designing exception classes

Related Topic