.NET Performance – Is Entity Framework Suitable for Bulk Inserts?

entity-frameworknetperformancesql server

My department develops software to migrate databases for our customers from their old CRM software to ours. In this process we could be inserting up do millions of rows, processed one at a time as we need to do mapping and formatting to go from their old system (which could be anything from a Relational Database to a set of flat binary files) to our MS-SQL database.

Currently we use a producer consumer model that reads in from their old system to our mapper, then use another producer/consumer model to go from that in to thread local DataTables which get batched out every 15,000 rows in a SQLBulkCopy operation to keep the number of objects held in ram down (larger tables can easy exhaust the 2 GB limit for a 32bit application if we try to hold the entire table in memory before inserting it.)

The issue is this is kind of a kludge setup to be able to generate the new rows and insert them in to the database. Recently we have had the opportunity to re-write the database to fit our needs and I have been working away at my manager at the idea to using some form of ORM.

I have been starting to look around at various ORM solutions, and I know we want to stay with a pure Microsoft stack (it was hard enough to get my manager to consider using an ORM, and I don't have a snowballs chance of convincing him of using a 3rd party library, since the main CRM app uses a 3rd party image processing library that is not very good and he has hated 3rd party libraries ever since). Comparing Linq-to-Sql and ADO.net Entity Framework I think the EF fits our needs better. However I am concerned about performance.

Most CRUD apps are Read->Update at a slow rate (only displaying one record or a set of records with a common variable at one time). However we will just be doing Create, Create, Create. Is an ORM like EF designed to do that?

My manager is obsessed with performance. When this process was done in VB6 (which is how it was done up to two years ago) it could take up to several hours to run a conversion. With our current kludge the longest ones take at most half an hour to process several Gigs of records (the slowest part is dealing with that 3rd party image library I mentioned earlier, ones that are pure text take at most 10 min). Could I potentially see a slowdown in EF versus using SqlBulkCopy, and what can I do to mitigate the running cost if it does?

Best Answer

I think I found the best of both worlds. On Microsoft's MSDN archive they have a LINQ Entity Data Reader which will allow me to turn my EF object in to a DbDataReader which can be passed in to SqlBulkInsert.