C# – How to diagnose async/await deadlocks

asynccdebugging

I am working with a new codebase that makes heavy use of async/await. Most of the people on my team are also fairly new to async/await. We generally tend to hold to Best Practices as Specified by Microsoft, but generally need our context to flow through the async call and are working with libraries that don't ConfigureAwait(false).

Combine all of those things and we run into async deadlocks described in the article… weekly. They don't show up during unit testing, because our mocked data sources (usually via Task.FromResult) aren't enough to trigger the deadlock. So during runtime or integration tests, some service call just goes out to lunch and never returns. That kills the servers, and generally makes a mess of things.

The problem is that tracking down where the mistake was made (usually just not being async all the way up) generally involves manual code inspection, which is time consuming and not automate-able.

What's a better way of diagnosing what caused the deadlock?

Best Answer

Ok - I am not sure whether the following will be of any help to you, because I made some assumptions in developing a solution which may or may not be true in your case. Maybe my "solution" is too theoretical and only works for artifical examples - I have not done any testing beyond the stuff below.
In addition, I would see the following more a workaround than a real solution but considering the lack of responses I think it might still be better than nothing (I kept watching your question waiting for a solution, but not seeing one getting posted I started playing around with the issue).

But enough said: Let's say we have a simple data service which can be used to retrieve an integer:

public interface IDataService
{
    Task<int> LoadMagicInteger();
}

A simple implementation uses asynchronous code:

public sealed class CustomDataService
    : IDataService
{
    public async Task<int> LoadMagicInteger()
    {
        Console.WriteLine("LoadMagicInteger - 1");
        await Task.Delay(100);
        Console.WriteLine("LoadMagicInteger - 2");
        var result = 42;
        Console.WriteLine("LoadMagicInteger - 3");
        await Task.Delay(100);
        Console.WriteLine("LoadMagicInteger - 4");
        return result;
    }
}

Now, a problem arises, if we are using the code "incorrectly" as illustrated by this class. Foo incorrectly accesses Task.Result instead of awaiting the result like Bar does:

public sealed class ClassToTest
{
    private readonly IDataService _dataService;

    public ClassToTest(IDataService dataService)
    {
        this._dataService = dataService;
    }

    public async Task<int> Foo()
    {
        var result = this._dataService.LoadMagicInteger().Result;
        return result;
    }
    public async Task<int> Bar()
    {
        var result = await this._dataService.LoadMagicInteger();
        return result;
    }
}

What we (you) now need is a way to write a test which succeeds when calling Bar but fails when calling Foo (at least if I understood the question correctly ;-) ).

I'll let the code speak; here's what I came up with (using Visual Studio tests, but it should work using NUnit, too):

DataServiceMock utilizes TaskCompletionSource<T>. This allows us to set the result at a defined point in the test run which leads to the following test. Note that we are using a delegate to pass back the TaskCompletionSource back into the test. You might also put this into the Initialize method of the test and use properties.

TaskCompletionSource<int> tcs = null;
this._dataService.LoadMagicIntegerMock = t => tcs = t;

Task<int> task = null;
TaskTestHelper.AssertDoesNotBlock(() => task = this._instance.Foo());

tcs.TrySetResult(42);

var result = task.Result;
Assert.AreEqual(42, result);

this._end = true;

What's happening here is that we first verify that we can leave the method without blocking (this would not work if someone accessed Task.Result - in this case we would run into a timeout as the result of the task is not made available until after the method has returned).
Then, we set the result (now the method can execute) and we verify the result (inside a unit test we can access Task.Result as we actually want the blocking to occur).

Complete test class - BarTest succeeds and FooTest fails as desired.

[TestClass]
public class UnitTest1
{
    private DataServiceMock _dataService;
    private ClassToTest _instance;
    private bool _end;

    [TestInitialize]
    public void Initialize()
    {
        this._dataService = new DataServiceMock();
        this._instance = new ClassToTest(this._dataService);

        this._end = false;
    }
    [TestCleanup]
    public void Cleanup()
    {
        Assert.IsTrue(this._end);
    }

    [TestMethod]
    public void FooTest()
    {
        TaskCompletionSource<int> tcs = null;
        this._dataService.LoadMagicIntegerMock = t => tcs = t;

        Task<int> task = null;
        TaskTestHelper.AssertDoesNotBlock(() => task = this._instance.Foo());

        tcs.TrySetResult(42);

        var result = task.Result;
        Assert.AreEqual(42, result);

        this._end = true;
    }
    [TestMethod]
    public void BarTest()
    {
        TaskCompletionSource<int> tcs = null;
        this._dataService.LoadMagicIntegerMock = t => tcs = t;

        Task<int> task = null;
        TaskTestHelper.AssertDoesNotBlock(() => task = this._instance.Bar());

        tcs.TrySetResult(42);

        var result = task.Result;
        Assert.AreEqual(42, result);

        this._end = true;
    }
}

And a little helper class to test for deadlocks / timeouts:

public static class TaskTestHelper
{
    public static void AssertDoesNotBlock(Action action, int timeout = 1000)
    {
        var timeoutTask = Task.Delay(timeout);
        var task = Task.Factory.StartNew(action);

        Task.WaitAny(timeoutTask, task);

        Assert.IsTrue(task.IsCompleted);
    }
}