Mysql – Broken Slave (MySQL replication)

MySQLmysql-replicationmysql5

Last night one of the developer ran a wrong mySQL statement (Query inserting a row with duplicate primary key) on master machine. Suddenly the replication to its slave stopped. Slave started writing it to mysql_relay_logs and before we could monitor and fix it, the disk ran out of space.

Master is configured to save bin_logs for last 3 days so we still have not completely lost the replication. Can someone help me fix this mess now.

My Master Status:

mysql> SHOW MASTER STATUS\G
*************************** 1. row ***************************
File: mysql-bin.000017
Position: 30844254
Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)

Slave Status:
mysql

> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
             Slave_IO_State: 
                Master_Host: 192.168.140.110
                Master_User: replication_user
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000014
        Read_Master_Log_Pos: 61132382
             Relay_Log_File: mysqld-relay-bin.000037
              Relay_Log_Pos: 1405976
      Relay_Master_Log_File: mysql-bin.000014
           Slave_IO_Running: No
          Slave_SQL_Running: No
            Replicate_Do_DB: 
        Replicate_Ignore_DB: 
         Replicate_Do_Table: 
     Replicate_Ignore_Table: 
    Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
                 Last_Errno: 1062
                 Last_Error: Error 'Duplicate entry '13676-1-2' for key 1' on query. Default database: 'XXX'. Query: 'YYY'
               Skip_Counter: 0
        Exec_Master_Log_Pos: 1405839
            Relay_Log_Space: 8974037551
            Until_Condition: None
             Until_Log_File: 
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File: 
         Master_SSL_CA_Path: 
            Master_SSL_Cert: 
          Master_SSL_Cipher: 
             Master_SSL_Key: 
      Seconds_Behind_Master: NULL
1 row in set (0.00 sec)

Can someone give me exact commands to make everything back to normal? as I said I am sure we have lost data on slave as it ran out of disk but we must be having data on master.

My next question is: How to ensure this doesnt happen again.

Thanks a ton
Sparsh Gupta

Best Answer

The straight-forward course of action is to wipe out the data on the slave and start fresh with a new copy of the master. The master can be copied by using mysqldump or through something like rsync. Depending on how large your database is, you may experience downtime. This will always work.

There's a chance that if you clear space off the slave drive you can start replication from where the error occurred by using (from inside mysql):

start slave

If the master's bin.log file is in place, there's a chance the replication will start right up. I haven't had very many experiences where this has happened.

The last time I did this was when a developer added a new table to a database, and didn't create it on the slave first. I created the table and had to use sql_slave_skip_counter to skip over the bits that were lost from the bin.log. I then manually inserted the 20 rows of data I had skipped over.

Related Topic