Sql-server – How to remove duplicate rows

duplicatessql servertsql

What is the best way to remove duplicate rows from a fairly large SQL Server table (i.e. 300,000+ rows)?

The rows, of course, will not be perfect duplicates because of the existence of the RowID identity field.

MyTable

RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null

Best Answer

Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) RowId as the row to keep. Then, just delete everything that didn't have a row id:

DELETE FROM MyTable
LEFT OUTER JOIN (
   SELECT MIN(RowId) as RowId, Col1, Col2, Col3 
   FROM MyTable 
   GROUP BY Col1, Col2, Col3
) as KeepRows ON
   MyTable.RowId = KeepRows.RowId
WHERE
   KeepRows.RowId IS NULL

In case you have a GUID instead of an integer, you can replace

MIN(RowId)

with

CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))

Related Solutions

Sql – How to return only the Date from a SQL Server DateTime datatype

SELECT DATEADD(dd, 0, DATEDIFF(dd, 0, @your_date))

for example

SELECT DATEADD(dd, 0, DATEDIFF(dd, 0, GETDATE()))

gives me

2008-09-22 00:00:00.000

Pros:

No varchar<->datetime conversions required
No need to think about locale

Sql-server – How to check if a column exists in a SQL Server table

SQL Server 2005 onwards:

IF EXISTS(SELECT 1 FROM sys.columns 
          WHERE Name = N'columnName'
          AND Object_ID = Object_ID(N'schemaName.tableName'))
BEGIN
    -- Column Exists
END

Martin Smith's version is shorter:

IF COL_LENGTH('schemaName.tableName', 'columnName') IS NOT NULL
BEGIN
    -- Column Exists
END

Best Answer

Related Solutions

Sql – How to return only the Date from a SQL Server DateTime datatype

Sql-server – How to check if a column exists in a SQL Server table

Related Topic