Mysql – really do with percona toolkit’s pt-table-sync

MySQLperconareplicationsynchronization

I have been looking for a tool to synchronize tables from 2 different databases and found pt-table-sync. I read the documentation and got confused: they mostly use examples referring to a replicated environment, but I thought the whole point of replication was to take care of the synchronization of data for you, hence my questions:

QUESTIONS

  1. What is the point of using pt-table-sync if the replication process is supposed to take care of the synchronization of data for you?

  2. Can I use pt-table-sync in a non-replicated environment (between 2+ hosts that have nothing to do with each other, is it the role of the pt-table-sync --execute host1 host2 host3 example given)?

  3. If I must use pt-table-sync in a replicated environment, can I do without the bin-logs on the master (there is an example talking about resolving differences discovered by pt-table-checksum so wondering if bin-logs are absolutely necessary)?

Best Answer

Answer to Question 1

MySQL Replication suffers from two major problems

  • MySQL Replication is Asynchronous. This may introduce replication delay. This manifests itself with communication problems between a Master and the Slave via the Slave I/O Thread. This may logically and numerically be seen in Seconds_Behind_Master.

  • Data Drift. This is a intermittent condition where a Master and Slave are simply out-of-sync because of factors outside the realm of MySQL Replication. For example, please note one way to better synchronize replication: use the option sync-binlog. When you set sync-binlog to 1, mysqld will perform a flush of the current binary log for every entry you record in the binary log. That can ridiculously slow down a Master. By default, sync-binlog is 0.

    • Here is a question: With sync-binlog=0, who is responsible for flushing the binary log to disk?
    • Answer (please sit down for this one): THE OPERATING SYSTEM !!!
    • With that answer, it puts the Slave as a terrible disadvantage because its I/O Thread is at the mercy of the Master's Operating System. When the Master's OS gets around to flushing the binary log changes to disk and the Slave's I/O Thread can detect the next incoming SQL statement, then the statement is shipped over the I/O Thread to the Slave.
    • Percona has a nice PDF on dealing with Data Drift

Answer to Question 2

The direct answer here is no because pt-table-sync was designed to detect the I/O thread of a Slave by means of the --sync-to-master option.

Answer to Question 3

The direct answer here is no because MySQL Replication demands to know

  • what is the current binary log on the Master? (this is Master_Log_File from SHOW SLAVE STATUS\G)
  • what is the latest position the Slave has read from Master's current binary log? (this is Read_Master_Log_Pos from SHOW SLAVE STATUS\G)

If you simply want your binary logs to get out of the way, you can do one of two things

  • OPTION 1 : On the Master, set expire-logs-days to 3 to keep the last 3 days worth of binary logs
    • Add expire-logs-days=3 in /etc/my.cnf
    • No restart required: Just run SET GLOBAL expire_logs_days = 3;
  • OPTION 2 : Run SHOW SLAVE STATUS\G on the Slave. Take the value of Relay_Master_Log_File. and use it to clear binary logs on the Master to up that log file.
    • Suppose you run SHOW SLAVE STATUS\G on the Slave
    • You get this Relay_Master_Log_File: mysql-bin.000035
    • Run this on the Master : PURGE BINARY LOGS TO 'mysql-bin.000035';

SUGGESTION

If you want to have more faith in pt-table-sync, try using the --print option and redirecting to a text file instead of the --execute option. This will generate the SQL that would normally execute on the Master. You could just run the SQL directly on that Slave thereafter. Think of it as a dress rehearsal for --execute.

Related Topic