Mysql Slave stuck in “System lock”

master-slaveMySQL

My MySQL slave is spending a lot of time in Slave_SQL_Running_State: System lock. I can see that the system is currently I/O write bound, and that it is processing the log, although slowly. Show processlist doesn't show anything other than "Waiting for master to send event" and "System lock" when it is in this state.

All my tables (other than the system tables) are InnoDB, and external locking is disabled. What is the slave doing in this state?

Here's some info that has been requested:

First, this is MySQL 5.6 community on an Amazon EC2 instance, with all storage on EBS.

mysql> show processlist;
+----+-------------+-----------+---------------+---------+--------+----------------------------------+------------------+
| Id | User        | Host      | db            | Command | Time   | State                            | Info             |
+----+-------------+-----------+---------------+---------+--------+----------------------------------+------------------+
|  1 | system user |           | NULL          | Connect |  26115 | Waiting for master to send event | NULL             |
|  2 | system user |           | NULL          | Connect | 402264 | System lock                      | NULL             |
| 14 | readonly    | localhost | theshadestore | Query   |      0 | init                             | show processlist |
+----+-------------+-----------+---------------+---------+--------+----------------------------------+------------------+
3 rows in set (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 184.106.16.14
                  Master_User: replicant
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin-log.000764
          Read_Master_Log_Pos: 505452667
               Relay_Log_File: relay-log.000197
                Relay_Log_Pos: 345413863
        Relay_Master_Log_File: bin-log.000746
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 345413702
              Relay_Log_Space: 19834085375
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 402263
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 307009
                  Master_UUID: b1bf9a19-dac0-11e2-8ffa-b8ca3a5bce90
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: System lock
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

Best Answer

Databases running on distributed storage facepalm. I would benchmark the filesystem running on top of the EC2 EBS storage system. Probably the simplest method is to use something like s=$(date +%s); dd if=/dev/zero of=<database-dir> bs=1M count=512; e=$(date +%s); echo "scale=4; 512 / ( $e - $s )" | bc. That assumes you have 512 MB to spare. Now, the problem with this benchmarking is that (1) it doesn't take into account caching effects, and (2) the resolution is not very fine. But if this test is slow, then the problem is definitely with EC2 EBS. If the test is fast or nominal, we have to dig further and use more sophisticated techniques.

The bonnie++ program is somewhat adequate, but it doesn't (AFAIK) flush the OS buffers between the write and read. Still, you should get an idea with something like bonnie++ -u mysql -r 8 -s 16:512 -n 1 -b -d <mysql-data-directory>. When I do this on a VM running on local storage, I get:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine   Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
test        16M:512  1173  99 +++++ +++ +++++ +++  3187  99 +++++ +++ +++++ +++
Latency              1553us      23us     330us     750us     173us    6372us
Version  1.96       ------Sequential Create------ --------Random Create--------
test                -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                  1  1850  20 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency             27428us      24us    1188us   30258us      36us    1107us

Here is what I get when running on a VM over NFS:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine   Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
test        16M:512  1273  98 +++++ +++ +++++ +++  3053  99 +++++ +++ +++++ +++
Latency              1372us      28us     380us     832us      19us    9473us
Version  1.96       ------Sequential Create------ --------Random Create--------
test                -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                  1   754  11 +++++ +++   728   8   751  12 +++++ +++   791   8
Latency             12695us      47us    5306us    3710us      30us    3278us