C++ – Using and designing asynchronous APIs with naturally synchronous parts

asynchronous-programmingc

I've been programming for a long time, but very rarely with anything asynchronous (and not often with anything to do with multithreading, either).

Mostly for the fun of it, I'm writing a program to download song lyrics, and save them to music files (via e.g. ID3v2 tags). I have most of that in place by now: I have two classes that handles downloading (one per site the program supports), plus the code necessary to write the tags to files.
The design I wanted from the beginning was fairly simple and synchronous; I had planned to create a synchronous/blocking API to download lyrics, and start such tasks in background threads to keep the UI alive.

The idea was that the class that wants to fetch lyrics would create a LyricFetcher instance, and call string LyricFetcher::fetchLyrics(string artist, string title). That, in turn, tries fetching the lyrics from each site in order, using something like

for (Site *site : sites) {
    lyrics = site->fetchLyrics(artist, title);
    if (successfully downloaded lyrics)
        return lyrics;
}
return (didn't find anything);

The problems appeared when I realized that the only APIs I could use to fetch data from the internet were asynchronous; you'd set up a request, and tell it (via signals and slots) to tell you when it's done.
With that, I couldn't simply return the lyrics as a string.
As a workaround, I designed the API to use a callback, which the site-specific code calls when it has either finished the download, or failed, i.e. something like void LyricFetcher::fetchLyrics(string artist, string title, function<void(string)> callback).
The callback was then forwarded to the site-specific classes, which have the same type signature as the fetchLyrics above.

That worked fine for a single site, but I ran in to new problems trying to implement the multi-site pseudocode above.
I can't use the pseudocode, since site->fetchLyrics would return immediately.
The only "solution" I came up with was to use nested callbacks, which is horrible.

void LyricFetcher::fetchLyrics(string artist, string title, function<void(string)> callback) {
    site1->fetchLyrics(artist, title, [=](string lyrics) {
        if (did match)
            callback(lyrics);
        else {
            site2->fetchLyrics(artist, title, [=](string lyrics) {
               ...

Not only is this ugly, but it also means you can't iterate through the sites in a simple loop. (I'm not even sure it'd work at all.)

What's a good way to design around issues like these?
The "user facing" API of LyricFetcher::fetchLyrics should preferably either be synchronous (with the lyrics as a simple return value), or use signals/slots to signal completion (and transfer results).

FWIW, this is in C++ with Qt (C++14, Qt 5.6), but I tried to stick with a C++y pseudocode in the post.

Best Answer

The problem here isn't about starting many tasks and waiting for them, but about running callback/signal-based in a sequential manner.

Use a concurrent function which takes the function which creates the ordered structure and the function which iterates through the ordered structure as arguments. For example, in the concurrent namespace, Qt has a mapReduce method with the following signature

QtConcurrent::mappedReduced(const Sequence &sequence, MapFunction mapFunction, ReduceFunction reduceFunction, QtConcurrent::ReduceOptions reduceOptions = UnorderedReduce | SequentialReduce)

Which can be used as follows:

void addToCollage(QImage &collage, const QImage &thumbnail)
   {
   QPainter p(&collage);
   static QPoint offset = QPoint(0, 0);
   p.drawImage(offset, thumbnail);
   offset += ...;
   }

QImage scaled(const QImage &image)
  {
  return image.scaled(100, 100);
  }

QList<QImage> images = ...;
QFuture<QImage> thumbnails = QtConcurrent::mapped(images, scaled);

QList<QImage> images = ...;
QFuture<QImage> collage = QtConcurrent::mappedReduced(images, scaled, addToCollage);

The reduce function will be called once for each result returned by the map function, and should merge the intermediate into the result variable. QtConcurrent::mappedReduced() guarantees that only one thread will call reduce at a time, so using a mutex to lock the result variable is not necessary. The QtConcurrent::ReduceOptions enum provides a way to control the order in which the reduction is done. If QtConcurrent::UnorderedReduce is used (the default), the order is undefined, while QtConcurrent::OrderedReduce ensures that the reduction is done in the order of the original sequence.

References

Related Solutions

JavaScript Asynchronous Programming – How Callbacks Work

It doesn't. Just taking a callback or passing a callback doesn't mean it's asynchronous.

For example, the .forEach function takes a callback but is synchronous.

var available = false;
[1,2,3].forEach( function(){
    available = true;
});
//code here runs after the whole .forEach has run,
//so available === true here

The setTimeout takes a callback too and is asynchronous.

function myFunction( fn ) {
    setTimeout( function() {
        fn(1,2,3);
    }, 0 );
}

var available = false;
myFunction( function() {
    available = true;
});
//available is never true here

Hooking to any asynchronous event in Javascript always requires a callback but that doesn't mean calling functions or passing them around is always asynchronous.

Synchronization – Combining Asynchronous and Synchronous Programming

Promises were made to solve problems like this; they work really well in functional languages (I've personally used them extensively in Javascript, where curiously jQuery actually has the worst implementation of them - see this comparison). The weird thing about using promises is accepting that things are easier when you make everything use them. I will frame my answer in JavaScript terms using the promise library Q, since that's what I know best - apologies if that's not your cup o' tea, and double apologies for answering a language agnostic question with a specific language and library to boot.

A promise is a wrapper for a callback function. In many libraries, this is called a .then() function. In simplest terms, a promise waits for a return, and then calls the function passed to then(). The simplicity was confusing for me in the beginning, so lets use your example. Let me upgrade x and y to functions. If they are functions that return promises (as many asynch heavy libraries do, like a NodeJS database accessor), this becomes trivial.

Q.all([x(), y()]).then(function(results) {
  // Do some stuff with results, which is an enumerable collection containing
  // the in-order results of the promises passed in.
});

The trouble is, what if x() and y() don't return promises? As I mentioned, your life gets easier if you make them return promises. Fortunately, any half-decent promise library provides tools for this as well. In fact, there are several ways to do this in Q.

// Upgrade x into a promise returning function
function promise_x(/*x params...*/) {
  return Q.fcall(x(/*x params...*/));
}

// Upgrade x into a promise with a deferred
// Useful if you have some standard error handling specific to x-like functions
function defer_x(/*x params...*/) {
  var deferred = Q.defer();
  // If x is asynch, assume it takes a callback function with params of (error, result)
  x(/*x params...*/), function (error, result) {
    if (error) {
      deferred.reject(new Error(error));
    } else {
      // If you have some processing to do with the result of x (and not y), you can do
      // it here and return the modified result instead
      deferred.resolve(result);
    }
  });
  return deferred.promise;
}

If you repeat that code for the y() function, you get back to where you can use Q.all() to wait for both x() and y() to complete.

Q also provides an API for creating promises, which is like a wrapper function for Q.defer().

However, the best feature a promise library provides is aggregated error handling. Expanding on our first Q.all example, what happens if x or y throw an error? As that code is written now, it gets lost. Consider the following:

MyOverallFunction() {
  Q.all([x(), y()]).then(function(results) {
    // Do some stuff with results, which is an enumerable collection containing
    // the in-order results of the promises passed in.
    var combinedResult = synchronousFunction(result[0], result[1]);
    anotherAsyncPromiseFn(combinedResult)
      .then(function(result)) {
        return result;
      });
  }).then(function(result) {
     // Here, result is the return value of anotherAsyncPromiseFn
  }).error(function(err) {
     // This is the cool part. Any error from x, y, synchronousFunction, or
     // anotherAsyncPromiseFn will propagate to here. If you don't want to
     // handle it, just re-throw it (after logging it, if you prefer).
  });
}

You'll also notice in the above example there are several places where combining sync and async methods is demonstrated.

Ultimately, the tools that are at your disposal depend upon your language and environment. Promises have been implemented in more languages than just JS, and another closely related idiom from async/functional programming is Futures. Admittedly, if your language of choice does not have a Promise library, this answer will be pretty useless to you, but hopefully you at least find it interesting.

Update: Now that you mention Java, it looks like there is a decent promise library in JDefered. I've not used it, but it looks like my examples could be applied with it in a fairly straightforward manner.

Best Answer

Related Solutions

JavaScript Asynchronous Programming – How Callbacks Work

Synchronization – Combining Asynchronous and Synchronous Programming

Related Topic