What is the best way to remove duplicate rows from a fairly large SQL Server
table (i.e. 300,000+ rows)?
The rows, of course, will not be perfect duplicates because of the existence of the RowID
identity field.
MyTable
RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null
Best Answer
Assuming no nulls, you
GROUP BY
the unique columns, andSELECT
theMIN (or MAX)
RowId as the row to keep. Then, just delete everything that didn't have a row id:In case you have a GUID instead of an integer, you can replace
with