C# – Accessing shared data without blocking in TPL

cconcurrencylockstask

I am writing a class that contains data. It exposes methods that allow to query the data, while the data is also being updated from an external source (web service, for example).

All the methods expose tasks that are started on a worker thread using the default task scheduler. So the structure of the class is as follows:

public class A
{
    private object _myLock;

    public Task<int> GetSomeNumber(int aParameter)
    {
        return Task.Run(() =>
        {
            lock (_myLock)
            { 
                int result = 0;
                // calculate the result...
                // ...
                return result;
            }
        });
    }

    public Task<string> GetSomeText(int aParameter)
    {
        return Task.Run(() =>
        {
            lock (_myLock)
            {
                string result = "";

                // calculate the result...
                // ...
                return result;
            }
        });
    }
}

The methods may be called several multiple times, while the data itself is being updated, so the calculations that require the shared data are surrounded with a lock statement. The problem is that this code blocks the thread in which the task is run. Of course, these are not UI thread, so the UI remains responsive, but it does consume threads and keeps them blocked. I would prefer for the worker threads in the thread pool to remain as available as possible.

Is there a way, a best practice of some sort, that allows to write this code so that the shared data is thread safe and at the same time the tasks are not blocked? If instead of waiting for a lock, the tasks would need to wait for something else to finish (a task), I would use ContinueWith in order to create a continuation task and return it, so that it would be scheduled to run when the first task is completed. Is there a way to do the same with lock, so that the task is only scheduled when the lock is available?

Update
I have done some reading and found out about several possible options that allow for shared data sync. These include: SemaphoreSlim, AsyncLock, and ConcurrentExclusiveSchedulerPair.

I would love to see a recommendation for a best practice, of writing a service class that provides "read only" task methods that can run concurrently with a read lock, and "write" task methods that require write lock.

Best Answer

This answer is based on the additional information about the use case mentioned in the comments.

We have a web service the provides graph nodes. We hold a local copy of part of the graph and expose and async API to query the graph. If the area that you query is in the cache then we simply calculate and return the result, otherwise we start downloading the missing part and then perform the calculation and return the result. There is also a recurring task in the background that polls the server for changes in the node and applies the changes to the local cache.

One should realize that if a request cannot be completed right away because the data needed to fulfill the request is simply not available (e.g. haven't been downloaded or cached), there are only a few choices:

Return a failure right away. This is similar to the TryGet pattern, which does not block - but the task won't succeed.
Return a failure right away, but also indicate that it has started an asynchronous request to fetch data from the external server. This way, the reader won't block, but will have to retry at a later time.
Require the reader to provide a callback function, which will be called later when the data is available. This can be combined with the asynchronous request approach.
Block the reader until the data has been fetched.

Each of the options above has its own set of "best practices" and integration patterns with TPL. It would be too much to discuss all of them here.

Help needed: If anyone knows of good online resource that covers the best practices for these options, please post as comments.

If you decide that blocking the reader is the only choice given the requirements of your application, keep in mind that:

The .NET default thread pool is elastic - meaning that it will launch more worker threads if it senses that some threads are being blocked. This is good. It means that blocking any number of readers will not result in system-wide failure.
If it is not elastic, your application may deadlock due to worker thread starvation (exhaustion).
You must therefore never change the settings on the default thread pool that will cause it to lose the elasticity - unless you are willing to reboot your service whenever that happens.
- For example, do not set a maximum number of worker threads.

If you want the concurrent readers to never get stuck (being obstruction free), I think you have several choices:

Multiple buffering

Link to Wikipedia article

The system maintains multiple copies of the graph, and rotate (cycle through) them whenever a modification is needed.
At any point in time, at most one copy will be modified. All other copies will be read-only.
When a modification is taking place, each reader must choose between:
- Reading from one of the other read-only, slightly out-of-date copies of the graph. Multiple readers can access it concurrently.
- Wait until the modification is complete, to make sure it will have access to the most up-to-date version of the graph.

Concurrent graph update algorithms

The exact details will depend on what operations and algorithms are needed for the application.

Some of them may be lock-free, using the hardware-enabled atomic instructions. Others may use various kinds of locks, but may try to minimize the probability or the length of time where readers might be blocked.

One may either use an existing library or implement one's own.

I'm not familiar with concurrent graph update algorithms, so I can't give much suggestions.

Best Answer

Related Solutions

Shared State and Performance Degradation in Multithreading

Concurrency – Apple Dispatch Queue vs Threads

Related Topic