Mysql – No changes made but MySQL InnoDB overloading CPU

debianinnodbMySQL

I haven't changed any scripts or settings what so ever. Now I have CPU load at least 4 times higher. Mysqld is taking around 360 %CPU according to TOP. It's Debian, I'm running some MyISAM tables but most are InnoDB. I have verified that the load get's high when the site using InnoDB goes live. Even now at night when the overall load is usually under 1, there is 12. I don't know what went wrong. I haven't changed anything. I already tried to completely restart the machine.

I changed my innodb_log_file_size to 2047M instead 512M for no visible improvement in load. There goes the rest of my.cnf Fine Tuning section:

key_buffer      = 4000M
max_allowed_packet  = 32M
thread_stack        = 256K
thread_cache_size   = 192

myisam-recover      = BACKUP
max_connections        = 10000

query_cache_limit       = 1M
query_cache_size        = 256M
innodb_buffer_pool_size=8G
innodb_additional_mem_pool_size=20M
sort_buffer=2M
thread_concurrency=16
record_buffer=2M
tmp_table_size=128M
max_heap_table_size=128M
table_cache=1024

innodb_log_file_size = 2047M
innodb_log_buffer_size = 16M
innodb_flush_log_at_trx_commit = 2
innodb_file_per_table
innodb_flush_method = O_DIRECT

It's 24GB RAM machine running both Apache and MySQL. There is around 20k visits per day. At this very moment there is still 20GB RAM free (it's night and I've just rebooted). System HDD is only 8% full. All HDDs seem writing/reading at proper speeds.

Is there something wrong with my configuration? Is it possible the load flipped high even though I haven't changed any server machine settings or PHP scripts? What else could possibly cause this?

EDIT: output from vmstat

vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
13  1    868 280368 179164 18089764    0    0   511   579   16   89 87  2 10  1
24  2    868 348948 183916 18136144    0    0  5791   610 1916 7552 89  2  7  1
23  0    868 501744 185972 18169456    0    0  3995  5877 2401 6277 91  3  5  1
 6  0    868 694992 186136 18202684    0    0  3385  4783 1006 5958 91  2  7  0
22  0    868 825240 186372 18243540    0    0  4133  4087 1098 5364 91  2  7  0
19  0    868 284452 186540 18281960    0    0  3907  4380  537 6468 91  3  7  0
44  0    868 123408 177236 17022560    0    0  3896  5173  483 6914 88  5  7  0
17  0    868 159388 173236 16729360    0    0  4625  8856 1433 7072 89  3  8  0
14  0    868 248836 173380 16763992    0    0  5284   698  819 7357 88  2  9  0
15  0    868 406092 173592 16809708    0    0  4730  5794 1148 7224 90  2  8  0

ps -eo command,pid,ppid,%cpu,%mem,rss,pri,psr,size –sort=-%cpu|head -n 20

COMMAND                       PID  PPID %CPU %MEM   RSS PRI PSR    SZ
/usr/sbin/mysqld --basedir=   379   342  364 11.1 2747020 19  1 14249120
gzip                         5537  5536 30.1  0.0   756   0   3   452
/usr/sbin/apache2 -k start   2435 27735  6.6  0.3 75408  19   1 47184
/usr/sbin/apache2 -k start   2388 27735  6.7  0.3 77928  19   3 48144
/usr/sbin/apache2 -k start   2323 27735  5.9  0.3 79660  19   3 48464
/usr/sbin/apache2 -k start   2363 27735  5.8  0.3 77156  19   4 47256
/usr/sbin/apache2 -k start   2418 27735  5.7  0.3 77248  19   7 46684
/usr/sbin/apache2 -k start   2350 27735  5.8  0.3 78504  19   2 48092
/usr/sbin/apache2 -k start   2437 27735  5.5  0.3 75928  19   3 47436
/usr/sbin/apache2 -k start   2439 27735  5.5  0.3 75716  19   3 47596
/usr/sbin/apache2 -k start   2356 27735  5.7  0.3 78560  19   3 48708
/usr/sbin/apache2 -k start   2284 27735  5.6  0.3 79532  19   3 47896
/usr/sbin/apache2 -k start   2349 27735  5.6  0.3 78248  19   7 48548
/usr/sbin/apache2 -k start   2368 27735  5.6  0.3 77100  19   3 45852
/usr/sbin/apache2 -k start   2387 27735  5.5  0.3 79964  19   7 48952
/usr/sbin/apache2 -k start   2383 27735  5.4  0.3 79212  19   1 48448
/usr/sbin/apache2 -k start   2169 27735  5.4  0.3 81740  19   3 48636
/usr/sbin/apache2 -k start   2411 27735  5.3  0.3 77292  19   3 47628
/usr/sbin/apache2 -k start   1779 27735  5.4  0.3 88876  19   0 48384

mpstat 5 10

Linux 2.6.26-2-amd64      12/12/2014      _x86_64_

03:38:23 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
03:38:28 PM  all   87.44    3.56    1.44    0.17    0.15    0.40    0.00    6.83   2363.80
03:38:33 PM  all   86.82    3.31    1.82    0.12    0.05    0.25    0.00    7.62   1703.00
03:38:38 PM  all   88.52    2.39    1.76    0.30    0.15    0.32    0.00    6.56   2552.68
03:38:43 PM  all   85.12    3.92    2.17    0.27    0.10    0.42    0.00    7.99   2810.60
03:38:48 PM  all   87.72    3.34    1.82    0.10    0.10    0.30    0.00    6.61   2368.00
03:38:53 PM  all   85.36    3.83    1.59    0.40    0.05    0.17    0.00    8.59   1589.60
03:38:58 PM  all   85.74    4.01    1.50    0.20    0.07    0.17    0.00    8.30   1648.00
03:39:03 PM  all   85.26    4.16    1.75    0.20    0.12    0.60    0.00    7.91   1764.20
03:39:08 PM  all   87.20    3.62    1.70    0.17    0.10    0.45    0.00    6.76   2221.80
03:39:13 PM  all   85.96    3.12    2.67    0.27    0.05    0.52    0.00    7.41   2829.60
Average:     all   86.52    3.53    1.82    0.22    0.09    0.36    0.00    7.46   2185.35

EDIT, MySQL 5.5

Upgrade to MySQL 5.5 actually made a big difference. Load dropped from 20 to 10 by day, from 10 to 6 by night. Traffic close to none = load 6, high traffic = load 10.

=> Now it's interesting, current loads are exactly 5 points above the values before the incident. No matter the traffic. This doesn't really make me much sense. There's no reason for load 6 on 8-core CPU almost without traffic and even more weird, that load only doubles while heavy traffic. Day load used to be 5 times higher than night.

==> Either MySQL 5.5 is miraculously efficient (compared to 5.0) OR there is something, that adds up those 5 points no matter what. I can't see any special process though. It has to be Apache/PHP/MySQL related.

EDIT, SOLVED (by itself)

4 days ago, load goes up in the sky, for no apparent reason. Today early morning, load goes down to normal values just like that.

enter image description here

I'm not a fan of blaming DDOS for everything, I actually ruled that out by looking at network statistics. But then again, I never experienced any attack, I might not recognize it.

FACT: After 3 days, I have managed to upgrade to MySQL 5.5 which didn't solve the problem entirely but lowered the load enough to make the site smooth again on visitors end. One day after, problem goes away. It's almost like somebody lost interest when the site wasn't suffering from long load times.

OS corruption wouldn't go away by itself. There were no background tasks. Attack seem like the only left explanation, though I haven't noticed any strange network activity. One thing I can say, our audience is exactly the type of people who often does these kind of things.

EDIT 29/12/2014

I'm not sure if anybody is still watching this thread. I'd just like to post an update. High loads came back after several days, then gone for couple, back etc. Sometimes the load is constantly high, sometimes it spikes. On day it can drop by 50% another rise by 100%. Time differes – there is no connection to actual traffic, background tasks, nothing. Although it really looks like DDOS, there is no strange network activity.

The big difference though is MySQL 5.5, now the machine can handle better whatever is eating its resources, keeping the site up and running. Bottleneck seem to be again the CPU, which doesn't make much sense, again.

The most "funny" is the unpredictability of this behaviour. That's just doesn't really fit any SW/HW fault, or does it ??

Best Answer

Beyond the shadow of any doubt, I strongly believe it is the InnoDB Storage Engine.

It is almost like a living, breathing organism.

Here is a Pictorial Representation from Percona's CTO Vadim Tkachenko

InnoDB Architecture

Please note the InnoDB Buffer Pool. If it has lots of dirty pages (changes to write back to the physical tables) and corresponding index changes (Insert Buffer section of the Buffer Pool), note the following writes

  • Write I/O Threads from Buffer Pool to .ibd files (Physical table files). This may require read/write I/O, perhaps opening and closing tables via LRU algorithms
  • Changes to Insert Buffer writes inside System Tablespace File (ibdata1)
  • Since you have innodb_flush_log_at_trx_commit = 2, the Log Buffer is being flushed once per second (not an issue in a dormant system since innodb_log_buffer_size is 16, can be an issue during heavy writes)
  • Dirty pages are written to the Double Write Buffer (helps mysqld survive a mysqld crash (or at least a decent amount of InnoDB damage control))

What is the point of discussing the InnoDB Storage Engine ?

If you have a lot of dirty pages written throughout the business day, there may be some activity to flush changes around the engine like a heart circulating blood. Even mild writes can hide for while. Since you have a 2G buffer pool, InnoDB could be just scouring the Buffer Pool (via its purge thread).

From the looks of the my.cnf, I would say you probably have MySQL 5.5 or higher. InnoDB, when not tuned, tends to be more aggressive with the purge thread.

Now, for a surprise. Did you know InnoDB can stay quiet with at least 75% of the Buffer still dirty ?

  • MySQL 5.5 and 5.6, innodb_max_dirty_pages_pct = 75
  • MySQL 5.0 and 5.1, innodb_max_dirty_pages_pct = 90

What this means if that InnoDB will not tolerate more than 75% (or 90%) dirty pages. It starts aggressive purge thread activity when dirty pages exceed the threshold of innodb_max_dirty_pages_pct. Once it falls below the threshold of 75% or 90%, InnoDB will quitely flush dirty pages steadily WHEN IT IS GOOD AND READY. That manifests itself as write I/O, server load, and increased CPU even when there are no INSERTs, UPDATEs, and DELETEs going on. Notwithstanding, InnoDB will decide when really old dirty pages need to be purged.

What you may need is just some tuning. Here is a sample of some settings you need

[mysqld]
innodb_read_io_threads = 16
innodb_write_io_threads = 16
innodb_thread_concurrency = 0

For a more in-depth look at having InnoDB engage more hyperthreading and CPU, please see my DBA StackExchange posts:

BTW, you have 24GB of RAM and only 2GB Buffer Pool? Please see my old post How large should be mysql innodb_buffer_pool_size?. Why? An undersized Buffer Pool, even with proper tuning for hyperthreading and CPU, can still possibly have intermittent writes out of nowhere.

UPDATE 2014-12-12 10:30 EST

Since you are at MySQL 5.0 right now, you could probably make all flushing as thorough as possible with the following settings:

[mysqld]
innodb_thread_concurrency = 0
innodb_max_dirty_pages_pct = 0

That should minimize, or at least reduce, occurrences of sudden rise in Load and CPU

These are optioinal

[mysqld]
innodb_fast_shutdown = 0
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1

Why These Optional Settings ?

  • Setting innodb_fast_shutdown = 0
    • makes sure all transactions are flushed to disk
    • making for a faster startup
  • Setting innodb_flush_log_at_trx_commit to 1 properly flushes to tables and redo logs. CAVEAT: Some hardware may still not respond to this setting.
  • If you have binary logging enabled, sync_binlog will flush the binary logs properly

Long term, you need really need to go with MySQL 5.6.