Database Design – Preventing Duplicate Records

databasedesign

A webservice that I call returns a list of data. The data from the webservice is updated periodically, so a call to the webservice done in one hour could return the same data as a call done in an hour. Also, the data is returned based on a start and end date.

We have multiple users that can run the webservice search, and duplicate data is most likely to be returned (especially for historical data). However I don't want to insert this duplicate data in the database.

I've created a db table in which the data is stored (most important columns are)

Id int autoincrement PK  
Date date not null        --The date to which the data set belongs.  
LastUpdate date not null  --The date the data set was last updated.  
UserName varchar(50)      --The name of the user doing the search.

I use sql server 2008 express with c# 4.0 and visual studio 2010. Entity Framework is used as the ORM. If stored procedures could be avoided in the proposed solution, then that will be a plus.

Another way of looking interpreting what I'm asking a solution for is as follows:
I have a million unique records in my table. A user does a new search. The search results from the user contains around 300k records of the data that is already in the db. An efficient solution to finding and inserting only the unique records is needed.

A combination of the Date, LastUpdate and UserName makes a record unique.

Best Answer

Well, the obvious solution is to have a unique key on the columns that make the row unique.

A combination of the Date, LastUpdate and UserName makes a record unique.

Alternatively, you might just get rid of the surrogate key and use the above as the primary key (depends upon where else you are using it).

Inserts should be done using Merge, which will allow you to insert the record only when it doesn't already exist.

Related Topic