Python – Deleting duplicate rows in sqlite

duplicate-removalduplicatespythonsqlite

How to entries that have same entries?
My table player have two aspects(name, wowp_id) that duplicate. How to merge them?

I have been looking for related questions. The code below is build on the answers that I found. Process runs fine but duplicates remain. I would like to have no duplicate names. If there are different wowp_id for multiple name I would prefer to remove wowp_id and keep only one entry of name.

def sql_removeduplicates():
    con = sqlite3.connect('WOWT.sql')
    with con:    
        cur = con.cursor()    
        cur.execute("SELECT name, COUNT(*) FROM players GROUP BY name, team, wowp_id HAVING COUNT(*) > 1")
        rows = cur.fetchall()
        con.commit()
        for row in rows:
            print row

my rows in players:

  (id, wowp_id, name, team)
(108, 501078041, u'prazluges', None)
(109, 507894244, u'Aidis', None)
(110, 500742127, u'Aidis', None)
(111, u'Aidis', u'Aidis', None)
(112, u'Aidis', u'Aidis', None)
(113, 500864543, u'prazluges', None)
(114, u'Aidis', u'Aidis', None)
(115, u'Aidis', u'Aidis', None)
(116, u'Aidis', u'Aidis', None)
(117, 501078041, u'satih', None)
(118, u'Aidis', u'Aidis', None)

Best Answer

You can do

DELETE FROM players
 WHERE id NOT IN
(
  SELECT MIN(id) id
    FROM players
   GROUP BY wowp_id, name
);

Note: before proceeding with DELETE make sure that you have a solid backup of your data.

After deleting duplicates from your table make sure to create a UNIQUE constraint

CREATE UNIQUE INDEX idx_wowp_id_name ON players(wowp_id, name);

Outcome after deduping:

|  id |   wowp_id |      name | team |
|-----|-----------|-----------|------|
| 108 | 501078041 | prazluges | None |
| 109 | 507894244 |     Aidis | None |
| 110 | 500742127 |     Aidis | None |
| 111 |     Aidis |     Aidis | None |
| 113 | 500864543 | prazluges | None |
| 117 | 501078041 |     satih | None |

Here is SQLFiddle demo