Mysql – Improving MySql replication speed

MySQLmysql-replication

i'm looking for pointers on how to improve the replication speed of a mysql slave. It is fast enough for normal production but needs a long time to catch up if it falls behind for some reason. (If the server was turned off for or replication stopped for some hours etc.)

Some data:

The database is around 2TB and the binlogs written daily are around 500GB.
For normal production use the slave lag stays at 0 or 1 seconds
While trying to catch up it come closer to master at around 1 second per second. (So catching up 10 hours takes 10 hours)
The Slave is ONLY replaying the data as a failover machine, there are no selects running against it and it should be able to replace the master if he dies (everything we be slow, but thats ok)
The servers are in the same physical network

What can be done to improve that speed ?

(Yes, improving the hardware is an option but we want to make sure we checked other options before spending lots of money)

Edit:

Some data in response to @3molo question:

Bandwith usage while catching up: ~2MB/s to 10MB/s, no problem there i guess.
CPU usage is low, io wait is high (slave only has 2 cpus, arround 70 to 85% percent io wait while catching up. So this means the bottleneck is in the disks ?
Ram usage is 1,5GB used of 4GB (the rest is 'cached'). So i don't see that much potential there ? (Since it's only write load anyways (?))

If the bottleneck is the disks (6 disk raid 10 setup) are there any options besides increasing the raid or do we have to go with that ?

Best Answer

This would be interesting to write down while catching up:

how much bandwidth is used
how much cpu utilization (especially i/o wait)
ram usage

In response to the new information about i/o wait being high; for innodb a lot can be done, take a look at. I've learned many things from mysqlperformanceblog. Here's some hints:

innodb_flush_method=O_DIRECT

"Avoid double buffering and reduce swap pressure, in most cases this setting improves performance. Though be careful if you do not have battery backed up RAID cache as when write IO may suffer."

innodb_flush_log_at_trx_commit=2

"If you’re not concern about ACID and can loose transactions for last second or two in case of full OS crash than set this value. It can dramatic effect especially on a lot of short write transactions."

Those did WONDERS for us, but the drawback is that you might loose a second of written data. This is because instead of write (flush to disk) every record, you flush every second.

Related Solutions

MySQL replication across geographically separate servers

We use replication across datacenters in several European countries (so they aren't across the world from each other, but they are certainly not local) and it works without any problem.

Replication will automatically restart if possible. If there is a problem with a query (e.g. a database is present on the master and not the slave, and a query uses it), then it will require manual correction by default (but you can set it to ignore such errors). If the databases are exact mirrors, then you should never need to manually restart replication.

If you have two servers and the master disappears, then to turn the slave into the 'master', just stop replication and alter your code (to write to the new 'master'). If you have three or more servers and the master disappears, then stop replication on the slaves, change them to use the new master, and start again. If they aren't exactly in sync (depends how much data is being transferred, how busy the servers are, how good the network connection is, etc), then you might have to do more work than that. The replication section of the MySQL documentation covers this in more detail.

I would suggest that you ensure that you are replicating over SSL (i.e. set the replication user to require a SSL connection).

MySQL Slave Replication Reset with No Master Downtime (MyISAM)

There are two alternatives

ALTERNATIVE #1 : Use XtraBackup

It is capable of copying MyISAM as well as InnoDB on a running master.

ALTERNATIVE #2 : Run rsync multiple times

You could run rsync against /var/lib/mysql on a master and copy it to /var/lib/mysql on a slave. Of course, I would run rsync several times. Until the final rsync you should run the FLUSH TABLES WITH READ LOCK. Before copying make sure you hose all binary logs and start from scratch.

Before running anything, please make sure binary logs are written in /var/lib/mysql on both master and slave by having something like the following in /etc/my.cnf:

[mysqld]
log-bin=mysql-bin

Please try running this script in the event you do not want to shutdown MySQL on the master:

mysql -u... -p... -e"SET GLOBAL innodb_max_dirty_pages_pct = 0; RESET MASTER;"
RSYNCSTOTRY=7
cd /var/lib/mysql
X=0
while [ ${X} -lt ${RSYNCSTOTRY} ]
do
    X=`echo ${X}+1|bc`
    rsync -r * slaveserver:/var/lib/mysql/.
    sleep 60
done
mysql -u... -p... -e"FLUSH TABLES WITH READ LOCK; SELECT SLEEP(86400);"
sleep 60
SLEEPID=`mysql -u... -p... -e"SHOW PROCESSLIST;" | grep "SELECT SLEEP(86400)" | awk '{print $1}'`
rsync -r * slaveserver:/var/lib/mysql/.
mysql -u... -p... -e"KILL ${SLEEPID};"

I am a little more conservative in terms of data and index pages being cached while doing this. Personally, I prefer to shutdown mysql after several rsyncs instead of the FLUSH TABLES WITH READ LOCK. Another alternative to this script would be the following script which shuts down mysql for the final rsync:

mysql -u... -p... -e"RESET MASTER;"
RSYNCSTOTRY=7
cd /var/lib/mysql
X=0
while [ ${X} -lt ${RSYNCSTOTRY} ]
do
    X=`echo ${X}+1|bc`
    rsync -r * slaveserver:/var/lib/mysql/.
    sleep 60
done
service mysql stop
rsync -r * slaveserver:/var/lib/mysql/.
service mysql start

That's all for the rsync portion from the master. What about the Slave ???

Before you start mysql on the slave, you need to have the log file and log position from the master. The binary logs you copied have you need, particularly the last binary log. Here is how you get it on the slave:

cd /var/lib/mysql
for X in `ls -l mysql-bin.0* | awk '{print $9}'`
do
    LOGFIL=${X}
done
LOGPOS=`ls -l ${LOGFIL} | awk '{print $5}'`
echo "Master Log File ${LOGFIL} Position ${LOGPOS}"

You can trust these numbers because you copied them personally from the master. Now that you have the master log and position, you can start up mysql on the slave and setup replication using the log file and log position that was just reported.

Give it a try !!!

CAVEAT

If you have any InnoDB data, you should set this about 1 hour before attempt to rsync:

SET GLOBAL innodb_max_dirty_pages_pct = 0;

This will cause InnoDB to page out uncommitted data from the InnoDB Buffer Pool faster.

Best Answer

Related Solutions

MySQL replication across geographically separate servers

MySQL Slave Replication Reset with No Master Downtime (MyISAM)

Related Topic