Seconds_behind_master value keeps on increasing in MySQL replication slave

mysql-replication

We have a MySQL 5.0.77 Master-Slave replication. The replication was not properly running for the past few weeks and it was giving Duplicate entry error 1062. The Set Global Skip-counter option didn't help, so I had to skip the error no.1062 by adding it to the /etc/my.cnf file and then it reported table doesn't exist error in one particular database.

I have then taken a mysqldump of that database and restored in Slave last weekend. Then the Slave IO_Thread and Slave_SQL both started running fine, and it looked like the replication was back on track. The Seconds_behind_master value was very high and then it started reducing for the past 4 days.

When I checked the slave replication status today, I found that the seconds_behind_master value is keep on increasing since morning. I stopped the slave IO_Thread once and then the seconds_behind_master became Null. Then after I started the IO_thread the value became the same and kept on increasing.

I see one process is running from morning
system user gss-app Connect 9535736 copy to tmp table ALTER TABLEqueue_clicksADD INDEX(puid)

Please help me to fix this issue. Thanks.

#mysql> show slave status\G;
`*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 203.x.x.x
                Master_User: replication
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000990
        Read_Master_Log_Pos: 185674180
             Relay_Log_File: mysqld-relay-bin.000224
              Relay_Log_Pos: 9286354
      Relay_Master_Log_File: mysql-bin.000774
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB:
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 472142385
            Relay_Log_Space: 112995681998
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 9533355
1 row in set (0.00 sec)`

Best Answer

I won't worry about it if the IO and SQL are running, as well as the Relay_Master_Log_File is catching up with the Master_Log_File. I believe the delay is in the fact that your total relay log file is huge, approx. 105G - Relay_Log_Space: 112995681998 and considering that the Slave is at 000774 position and master is at 000990 position, there are a total of 214 binary logs of approximately 468M each (105G/214) waiting to be replayed on the slave.

My advice is to keep an eye on the Relay_Master_Log_File and make sure that is is going up and catching up with the Master_Log_File. I also see that the master host is in a public IP address, is this replication taking place over a public network or slow WAN? That could be introducing a delay, that faster the link the better.

Related Topic