Linux – MySQL – Why would SHOW SLAVE HOSTS cause a binlog dump

linuxMySQLmysql-replication

We're getting loads of binlog files in our MySQL 5.0.x. We have a normal master/slave replication thing going with 1 master, 1 slave. Looking at /var/log/mysql.log, nearly 90% of the time the replicator connects and does a SHOW SLAVE HOSTS causes a bin log dump.

For example:

           7020 Query       SHOW SLAVE HOSTS
           7020 Binlog Dump Log: 'mysql-bin.029634'  Pos: 13273

However when I do a SHOW SLAVE HOSTS on the mysql myself, I get no results.

Occasionally when the replicator does a SHOW SLAVE HOSTS, mysql will hang for hours. I see nothing in the /var/log/syslog at the same time…

What's going on here? How can I debug this more?

For the record the MySQL master and slave servers are ubuntu dapper.

Best Answer

Some things to check:

  • Disk space for binlogs.

  • Check ifconfig for network errors. I've seen bad packets force replication off track.

  • check SHOW SLAVE STATUS and /etc/my.cnf to make sure that the MySQL server IDs didn't become the same. I've seen this happen and it can be terrible for replication.

  • The mysql log for trouble writing logs or tables. If they become owned by root, that could spell trouble.

  • Monitor your server temperatures using lm-sensors if you can. A failing fan will raise system temperature and prematurely degrade memory, disk, and raid controller performance. I've seen raid controllers fail more frequently than disks on hot systems.

I would also schedule downtime for the servers to run an CHECK TABLE; REPAIR TABLE; OPTIMIZE TABLE as appropriate. I've seen silence in MySQL logs where a table was dammaged and the error messages were getting ignored form the application.

I've seen bad memory corrupt tables. During scheduled downtime, run a physical memory test. Memory seats on the motherboard can also go bad, too.

I suspect you might have a network change that started this, and it could be something as irritating as a change to DNS lookups. Your /etc/resolv.conf might now be blank, for instance, and it could be failing to resolve your slave address if it's not an IP address.