Mysql – really do with percona toolkit’s pt-table-sync

MySQLperconareplicationsynchronization

I have been looking for a tool to synchronize tables from 2 different databases and found pt-table-sync. I read the documentation and got confused: they mostly use examples referring to a replicated environment, but I thought the whole point of replication was to take care of the synchronization of data for you, hence my questions:

QUESTIONS

What is the point of using pt-table-sync if the replication process is supposed to take care of the synchronization of data for you?
Can I use pt-table-sync in a non-replicated environment (between 2+ hosts that have nothing to do with each other, is it the role of the pt-table-sync --execute host1 host2 host3 example given)?
If I must use pt-table-sync in a replicated environment, can I do without the bin-logs on the master (there is an example talking about resolving differences discovered by pt-table-checksum so wondering if bin-logs are absolutely necessary)?

Best Answer

Answer to Question 1

MySQL Replication suffers from two major problems

MySQL Replication is Asynchronous. This may introduce replication delay. This manifests itself with communication problems between a Master and the Slave via the Slave I/O Thread. This may logically and numerically be seen in Seconds_Behind_Master.
Data Drift. This is a intermittent condition where a Master and Slave are simply out-of-sync because of factors outside the realm of MySQL Replication. For example, please note one way to better synchronize replication: use the option sync-binlog. When you set sync-binlog to 1, mysqld will perform a flush of the current binary log for every entry you record in the binary log. That can ridiculously slow down a Master. By default, sync-binlog is 0.
- Here is a question: With sync-binlog=0, who is responsible for flushing the binary log to disk?
- Answer (please sit down for this one): THE OPERATING SYSTEM !!!
- With that answer, it puts the Slave as a terrible disadvantage because its I/O Thread is at the mercy of the Master's Operating System. When the Master's OS gets around to flushing the binary log changes to disk and the Slave's I/O Thread can detect the next incoming SQL statement, then the statement is shipped over the I/O Thread to the Slave.
- Percona has a nice PDF on dealing with Data Drift

Answer to Question 2

The direct answer here is no because pt-table-sync was designed to detect the I/O thread of a Slave by means of the --sync-to-master option.

Answer to Question 3

The direct answer here is no because MySQL Replication demands to know

what is the current binary log on the Master? (this is Master_Log_File from SHOW SLAVE STATUS\G)
what is the latest position the Slave has read from Master's current binary log? (this is Read_Master_Log_Pos from SHOW SLAVE STATUS\G)

If you simply want your binary logs to get out of the way, you can do one of two things

OPTION 1 : On the Master, set expire-logs-days to 3 to keep the last 3 days worth of binary logs
- Add expire-logs-days=3 in /etc/my.cnf
- No restart required: Just run SET GLOBAL expire_logs_days = 3;
OPTION 2 : Run SHOW SLAVE STATUS\G on the Slave. Take the value of Relay_Master_Log_File. and use it to clear binary logs on the Master to up that log file.
- Suppose you run SHOW SLAVE STATUS\G on the Slave
- You get this Relay_Master_Log_File: mysql-bin.000035
- Run this on the Master : PURGE BINARY LOGS TO 'mysql-bin.000035';

SUGGESTION

If you want to have more faith in pt-table-sync, try using the --print option and redirecting to a text file instead of the --execute option. This will generate the SQL that would normally execute on the Master. You could just run the SQL directly on that Slave thereafter. Think of it as a dress rehearsal for --execute.

Related Solutions

Sql-server – Replicating/Synchronizng multiple tables across different Databases on the same instance

Firstly, SQL Server's replication features can be set up between different databases on the same instance.

Setting up and administering SQL Server replication might be more effort than you want to take on. There are lots of decisions to be made (what kind of replication? All of the columns or just some of them? All of the rows or just some of them? Do I want to index the target tables? Some kinds of replication require changes to the underlying data model. If you don't control the source code of the applications, is changing the data model even possible? etc, etc.), replication can break and it might not be noticed for a while, log files can grow unexpectedly.

With triggers, you have to maintain the trigger code in the case that the underlying tables change. What happens if the trigger stops working? How do you re-sync the tables? How long does that take? Etc, etc.

As mentioned in the comments, one alternative to replication is using views. Potentially, that means maintaining code in the case that the base tables (T1, T2, T3) change for whatever reason. Because of that, views would be my second suggestion.

My first suggestion would be to use the "synonyms" feature to simply refer to the original tables. If you use views or synonyms, the data will only be stored in one place (DB1), so there is no worry about synchronizing changes between copies of the data.

The possible negative here (for views or synonyms) is that DB2 will not actually contain the T1 data, so a backup and restore of DB2 (to a testing or dev server) would also need a backup and restore of DB1.

MySQL Database synchronizing with local and remote with c#

You might get MySQL master/slave replication to work by adjusting the timeouts and re-issuing the "STOP SLAVE" and "START SLAVE" commands. In the slave config file, you can set slave-net-timeout to the number of seconds after which it considers the connection severed, and set master-connect-retry to the number of seconds it should pause between trying to reconnect. Both are set to 86400 seconds, or 1 day. For example, you might try 30 seconds like this:

slave-net-timeout = 30
master-connect-retry = 30

An alternative is to try a solution like SymmetricDS database synchronization or Tungsten. I've used SymmetricDS in situations where the network is slow or even unavailable at times, and it recovers automatically. It can be run as a standalone replication server, or it can be embedded and controlled from an application.