I’ve got an MVC site that’s using Entity Framework 6 to handle the database, and I’ve been experimenting with changing it so that everything runs as async controllers and calls to the database are ran as their async counterparts (eg. ToListAsync() instead of ToList())
The problem I’m having is that simply changing my queries to async has caused them to be incredibly slow.
The following code gets a collection of "Album" objects from my data context and is translated to a fairly simple database join:
// Get the albums
var albums = await this.context.Albums
.Where(x => x.Artist.ID == artist.ID)
.ToListAsync();
Here’s the SQL that’s created:
exec sp_executesql N'SELECT
[Extent1].[ID] AS [ID],
[Extent1].[URL] AS [URL],
[Extent1].[ASIN] AS [ASIN],
[Extent1].[Title] AS [Title],
[Extent1].[ReleaseDate] AS [ReleaseDate],
[Extent1].[AccurateDay] AS [AccurateDay],
[Extent1].[AccurateMonth] AS [AccurateMonth],
[Extent1].[Type] AS [Type],
[Extent1].[Tracks] AS [Tracks],
[Extent1].[MainCredits] AS [MainCredits],
[Extent1].[SupportingCredits] AS [SupportingCredits],
[Extent1].[Description] AS [Description],
[Extent1].[Image] AS [Image],
[Extent1].[HasImage] AS [HasImage],
[Extent1].[Created] AS [Created],
[Extent1].[Artist_ID] AS [Artist_ID]
FROM [dbo].[Albums] AS [Extent1]
WHERE [Extent1].[Artist_ID] = @p__linq__0',N'@p__linq__0 int',@p__linq__0=134
As things go, it’s not a massively complicated query, but it’s taking almost 6 seconds for SQL server to run it. SQL Server Profiler reports it as taking 5742ms to complete.
If I change my code to:
// Get the albums
var albums = this.context.Albums
.Where(x => x.Artist.ID == artist.ID)
.ToList();
Then the exact same SQL is generated, yet this runs in just 474ms according to SQL Server Profiler.
The database has around 3500 rows in the "Albums" table, which isn’t really very many, and has an index on the "Artist_ID" column, so it should be pretty fast.
I know that async has overheads, but making things go ten times slower seems a bit steep to me! Where am I going wrong here?
Best Answer
I found this question very interesting, especially since I'm using
async
everywhere with Ado.Net and EF 6. I was hoping someone to give an explanation for this question, but it doesn't happened. So I tried to reproduce this problem on my side. I hope some of you will find this interesting.First good news : I reproduced it :) And the difference is enormous. With a factor 8 ...
First I was suspecting something dealing with
CommandBehavior
, since I read an interesting article aboutasync
with Ado, saying this :"Since non-sequential access mode has to store the data for the entire row, it can cause issues if you are reading a large column from the server (such as varbinary(MAX), varchar(MAX), nvarchar(MAX) or XML)."
I was suspecting
ToList()
calls to beCommandBehavior.SequentialAccess
and async ones to beCommandBehavior.Default
(non-sequential, which can cause issues). So I downloaded EF6's sources, and put breakpoints everywhere (whereCommandBehavior
where used, of course).Result : nothing. All the calls are made with
CommandBehavior.Default
.... So I tried to step into EF code to understand what happens... and.. ooouch... I never see such a delegating code, everything seems lazy executed...So I tried to do some profiling to understand what happens...
And I think I have something...
Here's the model to create the table I benchmarked, with 3500 lines inside of it, and 256 Kb random data in each
varbinary(MAX)
. (EF 6.1 - CodeFirst - CodePlex) :And here's the code I used to create the test data, and benchmark EF.
For the regular EF call (
.ToList()
), the profiling seems "normal" and is easy to read :Here we find the 8.4 seconds we have with the Stopwatch (profiling slow downs the perfs). We also find HitCount = 3500 along the call path, which is consistent with the 3500 lines in the test. On the TDS parser side, things start to became worse since we read 118 353 calls on
TryReadByteArray()
method, which is were the buffering loop occurs. (an average 33.8 calls for eachbyte[]
of 256kb)For the
async
case, it's really really different.... First, the.ToListAsync()
call is scheduled on the ThreadPool, and then awaited. Nothing amazing here. But, now, here's theasync
hell on the ThreadPool :First, in the first case we were having just 3500 hit counts along the full call path, here we have 118 371. Moreover, you have to imagine all the synchronization calls I didn't put on the screenshoot...
Second, in the first case, we were having "just 118 353" calls to the
TryReadByteArray()
method, here we have 2 050 210 calls ! It's 17 times more... (on a test with large 1Mb array, it's 160 times more)Moreover there are :
Task
instances createdInterlocked
callsMonitor
callsExecutionContext
instances, with 264 481 CapturesSpinLock
callsMy guess is the buffering is made in an async way (and not a good one), with parallel Tasks trying to read data from the TDS. Too many Task are created just to parse the binary data.
As a preliminary conclusion, we can say Async is great, EF6 is great, but EF6's usages of async in it's current implementation adds a major overhead, on the performance side, the Threading side, and the CPU side (12% CPU usage in the
ToList()
case and 20% in theToListAsync
case for a 8 to 10 times longer work... I run it on an old i7 920).While doings some tests, I was thinking about this article again and I notice something I miss :
"For the new asynchronous methods in .Net 4.5, their behavior is exactly the same as with the synchronous methods, except for one notable exception: ReadAsync in non-sequential mode."
What ?!!!
So I extend my benchmarks to include Ado.Net in regular / async call, and with
CommandBehavior.SequentialAccess
/CommandBehavior.Default
, and here's a big surprise ! :We have the exact same behavior with Ado.Net !!! Facepalm...
My definitive conclusion is : there's a bug in EF 6 implementation. It should toggle the
CommandBehavior
toSequentialAccess
when an async call is made over a table containing abinary(max)
column. The problem of creating too many Task, slowing down the process, is on the Ado.Net side. The EF problem is that it doesn't use Ado.Net as it should.Now you know instead of using the EF6 async methods, you would better have to call EF in a regular non-async way, and then use a
TaskCompletionSource<T>
to return the result in an async way.Note 1 : I edited my post because of a shameful error.... I've done my first test over the network, not locally, and the limited bandwidth have distorted the results. Here are the updated results.
Note 2 : I didn't extends my test to other uses cases (ex :
nvarchar(max)
with a lot of data), but there are chances the same behavior happens.Note 3 : Something usual for the
ToList()
case, is the 12% CPU (1/8 of my CPU = 1 logical core). Something unusual is the maximum 20% for theToListAsync()
case, as if the Scheduler could not use all the Treads. It's probably due to the too many Task created, or maybe a bottleneck in TDS parser, I don't know...