MySQL innodb has lots of processes hanging in “update” state once every minute

database-performanceinnodbMySQLperformance

Description of problem

I have a fairly large MySQL installation. There are at least 3 separate servers which run the innoDB storage engine. Once every minute, at the same time each minute, for about 3-4 seconds, each of my innodb machines suddenly stop performing well.

When I do a SHOW PROCESSLIST on each server at normal times, I see about 10-15 connections, doing their thing as normal:

+--------+------------------+--------------------------+------+---------+------+-------+------------------+
| Id     | User             | Host                     | db   | Command | Time | State | Info             |
+--------+------------------+--------------------------+------+---------+------+-------+------------------+
|  23457 | root             | localhost                | NULL | Query   |    0 | NULL  | show processlist | 
| 180042 | **********       | web2.***.com:49867       | ***  | Sleep   |    1 |       | NULL             | 
| 180129 | **********       | web1.***.com:54302       | ***  | Sleep   |    0 |       | NULL             | 
| 180155 | **********       | web2.***.com:50225       | ***  | Sleep   |    0 |       | NULL             | 
| 180163 | **********       | web1.***.com:54425       | ***  | Sleep   |    0 |       | NULL             | 
| 180172 | **********       | web1.***.com:54507       | ***  | Sleep   |    0 |       | NULL             | 
| 180181 | **********       | web4.***.com:34893       | ***  | Sleep   |    0 |       | NULL             | 
+--------+------------+--------------------------+------+---------+------+-------+------------------------+

Then all of a sudden, almost exactly in sync on each machine, at the same time each minute (meaning on :47 seconds after the minute each minute on each machine), processes will pile up hanging in the "update" state:

| 192938 |  **********       | web3.***.com:44248              | ***  | Query   |    3 | update | INSERT INTO user_stats (***_uid, data) VALUES (101670151,"{\"inbox\":{\"new\":12,\"spam_check\":1289 | 
| 192939 |  **********       | web4.***.com:50264              | ***  | Query   |    3 | update | INSERT INTO user_stats (***_uid, data) VALUES (17103785,"{\"inbox\":{\"new\":1,\"spam_check\":0,\"di | 
| 192940 |  **********       | web3.***.com:44258              | ***  | Query   |    3 | update | INSERT INTO user_stats (***_uid, data) VALUES (2245293,"{\"inbox\":{\"new\":14,\"spam_check\":128933 | 
| 192941 |  **********       | web3.***.com:44268              | ***  | Query   |    3 | update | INSERT INTO user_stats (***_uid, data) VALUES (105330063,"{\"inbox\":{\"new\":4,\"spam_check\":0,\"d | 
... 100-200 more just like this...
| 192941 |  **********       | web3.***.com:44268              | ***  | Query   |    3 | update | INSERT INTO user_stats (***_uid, data) VALUES (105330063,"{\"inbox\":{\"new\":4,\"spam_check\":0,\"d | 

Upon closer inspection, it seems like it's high CPU usage at that moment (although I suppose high CPU could be caused by high disk I/O), because when it's in the midst of this, and I run something simple like SELECT NOW(), even that will take like 4 seconds to complete.

Here's what I know:

  1. It isn't a rogue unoptimized query. Happens on different machines that not only run different queries, but also different tables.
  2. It only happen on machines that do writing to innoDB tables. This does not happen on machines that only read innoDB, or machines that only write or read from MyISAM.

Questions

Is there a process that runs every minute on innoDB that takes up lots of CPU or disk I/O? Is this normal? I know that it could be a million different things, but I'm looking for known problems or solutions. Is there any more info I can provide to help solve this issue?

additional info

OS:

uname -a
Linux db04.****.com 2.6.18-194.17.4.el5 #1 SMP Wed Oct 20 13:03:08 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

File system:

/dev/sda4     ext3   785711096  80539996 665259216  11% /data

Raid config:

/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL


Adapter 0 -- Virtual Drive Information:
Virtual Disk: 0 (target id: 0)
Name:Virtual Disk 0
RAID Level: Primary-1, Secondary-3, RAID Level Qualifier-0
Size:856704MB
State: Optimal
Stripe Size: 64kB
Number Of Drives:2
Span Depth:3
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default

MySQL Version

mysql> select version();
+---------------------------+
| version()                 |
+---------------------------+
| 5.0.80-enterprise-gpl-log | 
+---------------------------+
1 row in set (0.01 sec)

Best Answer

Are you sure you don't have a cron job running every minute?

What is your value for innodb_flush_method?

Since you have a RAID Controller with BBU-backed write cache (and data/logs not stored on a SAN) the recommended setting is: O_DIRECT

You might also want to use a tool like innotop to analyze your load better. Especially Pending I/O.

HTH

Edit: What's your value for innodb_buffer_pool_size as well?