C# – SQL query or C# .net code for csv files import

ccsvdbmsnetsql

My aim here is to find out the best possible and feasible solution for my dilemma. I want to import some csv file (may contain around 50~60K records) into database after some manipulation into the files, manipulation includes comparison such as one record already exists into database or not. I already have a .net c# application which

  • gets the file
  • fetches its data into datatable
  • loop through each record, do comparison for each of the record from database
  • and finally updates/creates/deletes record.

this whole process is apparently too time consuming. My client needs this process working as fast as possible. I suggested him to do this import process with SQL only(though I'm not sure whether it would work more efficiently using SQL features such as staging table, temp table and cursors).

Now by SQL this is what I thought could be done in order to get it working better:

  • create a staging table
  • using BULK IMPORT or SQLBULKCOPY function import all csv file's data into this staging table.
  • create a SP which would loop through(using CURSOR ofcourse) all records from staging table and will compare each records from already existing db table and will create/update/delete record accordingly.
  • empty staging table.

I'm not sure the SQL version would work more efficiently in my case or not. In total I'm not able to see a clear vision that whether I should go with the SQL version of solution or stick with the current .net C# code and optimize this current code only. It would be great if someone can advise me on the same.


let me explain this with the help of an example:

  1. these are the headers of csv file:
    StudentId, StudentFname, StudentLname, ClassName, teacherId, teacherFname, teacherLname
  2. suppose I've copied all its data as it is into staging table.
  3. now I've to update those students which are already in STUDENT table and create those which do not exist, check if the classname already exists in CLASS table (if not then create class), check if the teacher id already exists in TEACHER table (if not then create teacher, otherwise update it).

Best Answer

SQL servers will offer specific products such as DTS or SSIS for importing data. These can be pretty optimised and run very quickly.

However!! If you have programming skills I highly recomend you write your own importing software.

The reason for this is that although the Sql server may be able to run an import process faster in a best case senario, it is inherently unscalable.

If you write your own importer then you can move all that logic and file reading off to a seperate server and only load your sql box with simple writes. This enables you to scale horizontaly.

SQL databases are one of the major pain points when scaling and the way to minimise this is to keep all business logic off the box. I have seen systems with complicated scheduling logic literaly running out of hours in the day for importing files, running a top spec sql box at 100% 24/7

Additonaly although tools lile DTS/SSIS are fine for DBAs who want to run the odd 'programming style' task. They fall well behind the features of a modern devops stack, such as versioning, debugging, no down time deployments etc etc

Related Topic