Stack Structure – Usage in Async Processes

stack

This question has an excellent answer by Eric Lippert describing what the stack is used for. For year's I've known – generally speaking – what the stack is and how it's used, but parts of his answers make me wonder if this stack structure is less-used today where async programming is the norm.

From his answer:

the stack is part of the reification of continuation in a language without coroutines.

Specifically, the without coroutines portion of this has me wondering.

He explains a bit more here:

Coroutines are functions that can remember where they were, yield control to another coroutine for a while, and then resume where they left off later, but not necessarily immediately after the just-called coroutine yields. Think of "yield return" or "await" in C#, which must remember where they were when the next item is requested or the asynchronous operation completes. Languages with coroutines or similar language features require more advanced data structures than a stack in order to implement continuation.

This is excellent in regards to the stack, but leaves me with an unanswered question about what structure is used when a stack is too simple to handle these language features that require more advanced data structures?

Is the stack going away as technology progresses? What replaces it? Is it a hybrid type of thing? (e.g., does my .NET program use a stack until it hits an async call then switches over to some other structure until completed, at which point the stack is unwound back to a state where it can be sure of the next items, etc?)

That these scenarios are too advanced for a stack makes perfect sense, but what replaces the stack? When I had learned about this years ago, the stack was there because it was lightning fast and lightweight, a piece of memory allocated at application away from the heap because it supported highly efficient management for the task at hand (pun intended?). What's changed?

Best Answer

Is it a hybrid type of thing? (e.g., does my .NET program use a stack until it hits an async call then switches over to some other structure until completed, at which point the stack is unwound back to a state where it can be sure of the next items, etc?)

Basically yes.

Suppose we have

async void MyButton_OnClick() { await Foo(); Bar(); }
async Task Foo() { await Task.Delay(123); Blah(); }

Here's an extremely simplified explanation of how the continuations are reified. The real code is considerably more complex, but this gets the idea across.

You click the button. A message is queued up. The message loop processes the message and calls the click handler, putting the return address of the message queue on the stack. That is, the thing that happens after the handler is done is that the message loop has to keep running. So the continuation of the handler is the loop.

The click handler calls Foo(), putting the return address of itself on the stack. That is, the continuation of Foo is the remainder of the click handler.

Foo calls Task.Delay, putting the return address of itself on the stack.

Task.Delay does whatever magic it needs to do to immediately return a Task. The stack is popped and we're back in Foo.

Foo checks the returned task to see if its completed. It is not. The continuation of the await is to call Blah(), so Foo creates a delegate which calls Blah(), and signs that delegate up as the continuation of the task. (I just made a slight mis-statement; did you catch it? If not, we'll reveal it in a moment.)

Foo then creates its own Task object, marks it as incomplete, and returns it up the stack to the click handler.

The click handler examines Foo's task and discovers it is incomplete. The continuation of the await in the handler is to call Bar(), so the click handler creates a delegate which calls Bar() and sets it as the continuation of the task returned by Foo(). It then returns up the stack to the message loop.

The message loop keeps processing messages. Eventually the timer magic created by the delay task does its thing and posts a message to the queue saying that the continuation of the delay task can now be executed. So the message loop calls the task continuation, putting itself on the stack as usual. That delegate calls Blah(). Blah() does what it does and returns up the stack.

Now what happens? Here's the tricky bit. The continuation of the delay task does not only call Blah(). It has to also trigger a call to Bar(), but that task doesn't know about Bar!

Foo actually created a delegate that (1) calls Blah(), and (2) calls the continuation of the task that Foo created and handed back to the event handler. That's how we call a delegate that calls Bar().

And now we've done everything that we needed to do, in the correct order. But we never stopped processing messages in the message loop for very long, so the application remained responsive.

That these scenarios are too advanced for a stack makes perfect sense, but what replaces the stack?

A graph of task objects containing references to each other via the closure classes of delegates. Those closure classes are state machines which keep track of the position of the most recently executed await and the values of the locals. Plus, in the example given, a global-state queue of actions implemented by the operating system and the message loop which executes those actions.

Exercise: how do you suppose this all works in a world without message loops? For example, console applications. await in a console app is quite different; can you deduce how it works from what you know so far?

When I had learned about this years ago, the stack was there because it was lightning fast and lightweight, a piece of memory allocated at application away from the heap because it supported highly efficient management for the task at hand (pun intended?). What's changed?

Stacks are a useful data structure when the lifetimes of method activations form a stack, but in my example the activations of the click handler, Foo, Bar and Blah do not form a stack. And therefore the data structure which represents that workflow cannot be a stack; rather it is a graph of heap-allocated tasks and delegates that represents a workflow. The awaits are the points in the workflow where progress cannot be made further in the workflow until work started earlier has completed; while we're waiting, we can execute other work that does not depend on those particular started tasks having completed.

The stack is just an array of frames, where frames contain (1) pointers to the middle of functions (where the call happened) and (2) values of local variables and temps. Continuations of tasks are the same thing: the delegate is a pointer to the function and it has a state which references a specific point in the middle of the function (where the await happened), and the closure has fields for each local variable or temporary. The frames just don't form a nice neat array anymore, but all the information is the same.

Related Topic