I was wondering if someone has experienced this and can perhaps provide some insight into this issue.
We have a plan-vanilla MySQL master-slave replication set up. The tables are MyISAM and the master can get quite read/write active. We use the slave instance to perform full daily backups in order to avoid bringing down the master server. The backup process does the following:
STOP SLAVE SQL_THREAD
mysqlhotcopy all tables
START SLAVE SQL_THREAD
Every once in a while (once a month or so) the replication breaks with varying error messages indicating a corrupt query or log file. Here's one that happened last night:
mysql> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: server8
Master_User: nexus8
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: bin.000045
Read_Master_Log_Pos: 581644327
Relay_Log_File: relay.000086
Relay_Log_Pos: 94131
Relay_Master_Log_File: bin.000045
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1064
Last_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '138070603'£' at line 1' on query. Default database: 'wtsdb'. Query: 'UPDATE fill SET clearing_fee='0.0E id='138070603'£'
Skip_Counter: 0
Exec_Master_Log_Pos: 4164743
Relay_Log_Space: 577574251
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
I follow the following procedure to recover from above error and resume replication:
stop slave;
change master to MASTER_LOG_POS = 4164743, MASTER_LOG_FILE = 'bin.000045';
start slave;
We have multiple servers set up this way and they all sporadically stop replicating with a similar error. Any advice on how to resolve this would be greatly appreciated.
Update:
Reading the excellent article suggested by pQd (specifically section 3.5), it seems quite possible that I am experiencing relay log corruption which can "make statements un-parsable." Apparently this is something that can occur but would then indicate a bug in MySQL replication code. I can't really reproduce this problem but it happens at least once a month for every slave in our production environment.
Here are two more errors for reference:
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: server10
Master_User: nexus10
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: bin.000008
Read_Master_Log_Pos: 1034654844
Relay_Log_File: relay.001392
Relay_Log_Pos: 109929
Relay_Master_Log_File: bin.000008
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1064
Last_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server versio n for the right syntax to use near '' at line 1' on query. Default database: 'avidb'. Query: 'DELETE FROM reconciliat'
Skip_Counter: 0
Exec_Master_Log_Pos: 1026067551
Relay_Log_Space: 8697758
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: server8
Master_User: nexus8
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: bin.000019
Read_Master_Log_Pos: 476189428
Relay_Log_File: relay.000163
Relay_Log_Pos: 98576
Relay_Master_Log_File: bin.000019
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1064
Last_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1' on query. Default database: 'wtsdb'. Query: 'UPDATE requestlog SET id='2173589',start_time='2009-08-18 04:59:10',end_time='2009-08-18 04:59:10',ip='X.X.X.X',session='hkacn7d54',lock_wait_time='0',unmarshal_time='0.00057435035705566',unmarshaled_objects='30',data_sync_time='0',page_process_time='0',memory_used='1572864',memory_used_peak='1572864',method='GET',request='/report.php',request_parameters='
groupId = -4
startDate = 2009-08-17
reportType = trades
accountId = 2409
endDate = 2009-08-17',progress_percent='0',progress_text='' WHERE id='2173588''
Skip_Counter: 0
Exec_Master_Log_Pos: 186172755
Relay_Log_Space: 290116589
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Best Answer
Double check that your character sets on the databases and connections are what you expect them to be - your replication choking on a statement with a non-ASCII character is immediately suspicious.
Look at the master's binary log to see how the statement it logged is different from the statement that the replica is trying to execute - that will offer clues to the problem.
Sadly, you should also make sure that the versions of the server for both the master and slave are the same. I recently encountered a weird instance where we had bad SQL trying to update a non-NULL numeric column to NULL. In the version that the master was running, MySQL silently converted it to 0, but in the version that the replica was running, it was an error.
Also, instead of doing
CHANGE MASTER
to fix the replica, you should be doinghttp://dev.mysql.com/doc/refman/5.1/en/set-global-sql-slave-skip-counter.html