Mysql – thesql error 1062 ‘duplicate entry’ when restoring backup

debian-lennyMySQLmysql5

Sorry, I seen similar threads but I still couldn't find it addressing my issue plus, I needed some more info on this.

Requirement: To create an exact replica 'db4' of an existing DB 'db3'.

Procedure followed:

  • mysqldump -uuser -ppass db3 > db3.sql (size is 6G)
  • mysql -uuser -ppass db4 < db3.sql (db4 was a newly created blank database)

The 2nd step throws in the error:

ERROR 1062 (23000) at line 5524: Duplicate entry '600806' for key 1"

I ran the 2nd step again with –force. The restore completed but with 2 additional similar errors:

ERROR 1062 (23000) at line 6309: Duplicate entry '187694' for key 1    
ERROR 1062 (23000) at line 6572: Duplicate entry '1567400' for key 1

On completion when I queried certain tables of db4 database, I was able to see missing records.

Question:

  1. Does this indicate a corrupted/problematic db3 database?

  2. How to proceed to create a 'consistent/working' replica (db4) of db3?

Thanks,

Best Answer

There are various reasons where such "duplicates" may occur:

  • your data is toast.
  • an application INSERTed using the auto-increment, but later ran an UPDATE to manually modify the identity to a different, probably existing value. This sometimes occurs with application developers who are sorting "by ID" and want to "fix the sorting" (by actually breaking consistency).
  • a variation of the second issue happens by folks who later add a "unique" constraint, after non-unique records have been inserted.

The error messages do refer to the first key, which in most database schemas is the first value. Take a look at the raw dump output, particularly the INSERT statements and check for

INSERT INTO ... values (0,...

Scan around the line numbers of the mysql dump given in the error messages.

Example of what kind of mysqldump I'm expecting:

INSERT INTO foo (id,bar,baz) values (1,2,3);
INSERT INTO foo (id,bar,baz) values (0,4,5);
INSERT INTO foo (id,bar,baz) values (2,6,7);

In a "normal" INSERT-statement on an auto-increment field, the value "0" specifies to auto-increment the field and so shouldn't show up in a database SQL dump where an auto-increment field is being used. By re-loading the database via a SQL dump, your dump asks the SQL server to increment the current field value by one and insert that ID. If someone manually UPDATEd the identity to zero after INSERTing the record, your MySQL Dump will include this strange id as well.

If you're replaying this dump into an empty table, this would try to create the following records:

1,2,3
2,4,5
2,6,7

As the "id" field has been set to unique autoincrement, the second INSERT will create a "wrong" record (expected: 0,4,5; actual: 2,4,5) which conflicts with the following record (id=2) and as a result gives the error message.

In a variation of this, someone "manually" updated the identity to an already existing value and later change the record to be "unique". Changing a record type to unique doesn't make MySQL re-validate if the current data matches the requirement, hence the delayed error. This variation may create a dump like this:

INSERT INTO foo (id,bar,baz) values (1,2,3);
INSERT INTO foo (id,bar,baz) values (1,4,5);
INSERT INTO foo (id,bar,baz) values (2,6,7);

Trying to insert the second line will fail due to the unique constraint.

In both cases, using "--force" only ignores the "conflicting" line and continues importing. The "conflicting" lines will be missing, but probably the lines leading to this conflict will be there (but with a wrong id record).

Please do check your database dump if my idea matches your problem. If that's the case, here are two workarounds to "make it work":

  • import data in two steps, first only the schema, later the data. Remove the unique constraints from your schema before importing the data, later again do add the unique constraints ("ALTER TABLE ... add unique...").

  • force-import schema and all data, resulting in "different" constraint issues. Manually check which records are correct and re-assign the wrong ones to their original value.

Example for the later issue:

mysql -uuser -ppass --execute "SET UNIQUE_CHECKS=0; source db3.sql" db4

This does force to import all conflicting records, even violating any real unique constraints. After import, you'll have multiple entries for those three records (600806, 187694 and 1567400), and you'll have to manually sort out which are the correct ones by checking your dump, which of those "duplicates" did result in the conflict and manually update the wrong records "back" to zero (or whatever the conflicting line in the dump said).

In both scenarios, your data still violates the given schema: your schema says data to be unique, but it isn't. In the long run, the data needs to be fixed on application level.