C++ – Using and designing asynchronous APIs with naturally synchronous parts

asynchronous-programmingc

I've been programming for a long time, but very rarely with anything asynchronous (and not often with anything to do with multithreading, either).

Mostly for the fun of it, I'm writing a program to download song lyrics, and save them to music files (via e.g. ID3v2 tags). I have most of that in place by now: I have two classes that handles downloading (one per site the program supports), plus the code necessary to write the tags to files.
The design I wanted from the beginning was fairly simple and synchronous; I had planned to create a synchronous/blocking API to download lyrics, and start such tasks in background threads to keep the UI alive.

The idea was that the class that wants to fetch lyrics would create a LyricFetcher instance, and call string LyricFetcher::fetchLyrics(string artist, string title). That, in turn, tries fetching the lyrics from each site in order, using something like

for (Site *site : sites) {
    lyrics = site->fetchLyrics(artist, title);
    if (successfully downloaded lyrics)
        return lyrics;
}
return (didn't find anything);

The problems appeared when I realized that the only APIs I could use to fetch data from the internet were asynchronous; you'd set up a request, and tell it (via signals and slots) to tell you when it's done.
With that, I couldn't simply return the lyrics as a string.
As a workaround, I designed the API to use a callback, which the site-specific code calls when it has either finished the download, or failed, i.e. something like void LyricFetcher::fetchLyrics(string artist, string title, function<void(string)> callback).
The callback was then forwarded to the site-specific classes, which have the same type signature as the fetchLyrics above.

That worked fine for a single site, but I ran in to new problems trying to implement the multi-site pseudocode above.
I can't use the pseudocode, since site->fetchLyrics would return immediately.
The only "solution" I came up with was to use nested callbacks, which is horrible.

void LyricFetcher::fetchLyrics(string artist, string title, function<void(string)> callback) {
    site1->fetchLyrics(artist, title, [=](string lyrics) {
        if (did match)
            callback(lyrics);
        else {
            site2->fetchLyrics(artist, title, [=](string lyrics) {
               ...

Not only is this ugly, but it also means you can't iterate through the sites in a simple loop. (I'm not even sure it'd work at all.)

What's a good way to design around issues like these?
The "user facing" API of LyricFetcher::fetchLyrics should preferably either be synchronous (with the lyrics as a simple return value), or use signals/slots to signal completion (and transfer results).

FWIW, this is in C++ with Qt (C++14, Qt 5.6), but I tried to stick with a C++y pseudocode in the post.

Best Answer

The problem here isn't about starting many tasks and waiting for them, but about running callback/signal-based in a sequential manner.

Use a concurrent function which takes the function which creates the ordered structure and the function which iterates through the ordered structure as arguments. For example, in the concurrent namespace, Qt has a mapReduce method with the following signature

QtConcurrent::mappedReduced(const Sequence &sequence, MapFunction mapFunction, ReduceFunction reduceFunction, QtConcurrent::ReduceOptions reduceOptions = UnorderedReduce | SequentialReduce)

Which can be used as follows:

void addToCollage(QImage &collage, const QImage &thumbnail)
   {
   QPainter p(&collage);
   static QPoint offset = QPoint(0, 0);
   p.drawImage(offset, thumbnail);
   offset += ...;
   }

QImage scaled(const QImage &image)
  {
  return image.scaled(100, 100);
  }

QList<QImage> images = ...;
QFuture<QImage> thumbnails = QtConcurrent::mapped(images, scaled);

QList<QImage> images = ...;
QFuture<QImage> collage = QtConcurrent::mappedReduced(images, scaled, addToCollage);

The reduce function will be called once for each result returned by the map function, and should merge the intermediate into the result variable. QtConcurrent::mappedReduced() guarantees that only one thread will call reduce at a time, so using a mutex to lock the result variable is not necessary. The QtConcurrent::ReduceOptions enum provides a way to control the order in which the reduction is done. If QtConcurrent::UnorderedReduce is used (the default), the order is undefined, while QtConcurrent::OrderedReduce ensures that the reduction is done in the order of the original sequence.

References