Mysql – Sporadic crash of master-slave MySQL replication process

MySQLmysql-replication

I was wondering if someone has experienced this and can perhaps provide some insight into this issue.

We have a plan-vanilla MySQL master-slave replication set up. The tables are MyISAM and the master can get quite read/write active. We use the slave instance to perform full daily backups in order to avoid bringing down the master server. The backup process does the following:

STOP SLAVE SQL_THREAD 
mysqlhotcopy all tables 
START SLAVE SQL_THREAD

Every once in a while (once a month or so) the replication breaks with varying error messages indicating a corrupt query or log file. Here's one that happened last night:

mysql> show slave status \G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: server8
                Master_User: nexus8
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: bin.000045
        Read_Master_Log_Pos: 581644327
             Relay_Log_File: relay.000086
              Relay_Log_Pos: 94131
      Relay_Master_Log_File: bin.000045
           Slave_IO_Running: Yes
          Slave_SQL_Running: No
            Replicate_Do_DB:
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 1064
                 Last_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '138070603'£' at line 1' on query. Default database: 'wtsdb'. Query: 'UPDATE fill SET clearing_fee='0.0E id='138070603'£'
               Skip_Counter: 0
        Exec_Master_Log_Pos: 4164743
            Relay_Log_Space: 577574251
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: NULL

I follow the following procedure to recover from above error and resume replication:

stop slave;
change master to MASTER_LOG_POS = 4164743, MASTER_LOG_FILE = 'bin.000045';
start slave;

We have multiple servers set up this way and they all sporadically stop replicating with a similar error. Any advice on how to resolve this would be greatly appreciated.

Update:

Reading the excellent article suggested by pQd (specifically section 3.5), it seems quite possible that I am experiencing relay log corruption which can "make statements un-parsable." Apparently this is something that can occur but would then indicate a bug in MySQL replication code. I can't really reproduce this problem but it happens at least once a month for every slave in our production environment.

Here are two more errors for reference:

*************************** 1. row *************************** 
             Slave_IO_State: Waiting for master to send event 
                Master_Host: server10
                Master_User: nexus10 
                Master_Port: 3306 
              Connect_Retry: 60 
            Master_Log_File: bin.000008 
        Read_Master_Log_Pos: 1034654844 
             Relay_Log_File: relay.001392 
              Relay_Log_Pos: 109929 
      Relay_Master_Log_File: bin.000008 
           Slave_IO_Running: Yes 
          Slave_SQL_Running: No 
            Replicate_Do_DB: 
        Replicate_Ignore_DB: 
         Replicate_Do_Table: 
     Replicate_Ignore_Table: 
    Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
                 Last_Errno: 1064 
                 Last_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server versio n for the right syntax to use near '' at line 1' on query. Default database: 'avidb'. Query: 'DELETE FROM reconciliat' 
               Skip_Counter: 0 
        Exec_Master_Log_Pos: 1026067551 
            Relay_Log_Space: 8697758 
            Until_Condition: None 
             Until_Log_File: 
              Until_Log_Pos: 0 
         Master_SSL_Allowed: No 
         Master_SSL_CA_File: 
         Master_SSL_CA_Path: 
            Master_SSL_Cert: 
          Master_SSL_Cipher: 
             Master_SSL_Key: 
      Seconds_Behind_Master: NULL 



*************************** 1. row *************************** 
             Slave_IO_State: Waiting for master to send event 
                Master_Host: server8
                Master_User: nexus8 
                Master_Port: 3306 
              Connect_Retry: 60 
            Master_Log_File: bin.000019 
        Read_Master_Log_Pos: 476189428 
             Relay_Log_File: relay.000163 
              Relay_Log_Pos: 98576 
      Relay_Master_Log_File: bin.000019 
           Slave_IO_Running: Yes 
          Slave_SQL_Running: No 
            Replicate_Do_DB: 
        Replicate_Ignore_DB: 
         Replicate_Do_Table: 
     Replicate_Ignore_Table: 
    Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
                 Last_Errno: 1064 
                 Last_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1' on query. Default database: 'wtsdb'. Query: 'UPDATE requestlog SET id='2173589',start_time='2009-08-18 04:59:10',end_time='2009-08-18 04:59:10',ip='X.X.X.X',session='hkacn7d54',lock_wait_time='0',unmarshal_time='0.00057435035705566',unmarshaled_objects='30',data_sync_time='0',page_process_time='0',memory_used='1572864',memory_used_peak='1572864',method='GET',request='/report.php',request_parameters='
groupId = -4 
startDate = 2009-08-17 
reportType = trades 
accountId = 2409 
endDate = 2009-08-17',progress_percent='0',progress_text='' WHERE id='2173588'' 
               Skip_Counter: 0 
        Exec_Master_Log_Pos: 186172755 
            Relay_Log_Space: 290116589 
            Until_Condition: None 
             Until_Log_File: 
              Until_Log_Pos: 0 
         Master_SSL_Allowed: No 
         Master_SSL_CA_File: 
         Master_SSL_CA_Path: 
            Master_SSL_Cert: 
          Master_SSL_Cipher: 
             Master_SSL_Key: 
      Seconds_Behind_Master: NULL 

Best Answer

Double check that your character sets on the databases and connections are what you expect them to be - your replication choking on a statement with a non-ASCII character is immediately suspicious.

Look at the master's binary log to see how the statement it logged is different from the statement that the replica is trying to execute - that will offer clues to the problem.

Sadly, you should also make sure that the versions of the server for both the master and slave are the same. I recently encountered a weird instance where we had bad SQL trying to update a non-NULL numeric column to NULL. In the version that the master was running, MySQL silently converted it to 0, but in the version that the replica was running, it was an error.

Also, instead of doing CHANGE MASTER to fix the replica, you should be doing

STOP SLAVE;
[correct and execute whatever statement is the problem]
SET GLOBAL sql_slave_skip_counter = 1;
START SLAVE;

http://dev.mysql.com/doc/refman/5.1/en/set-global-sql-slave-skip-counter.html

Related Topic