I haven't changed any scripts or settings what so ever. Now I have CPU load at least 4 times higher. Mysqld is taking around 360 %CPU according to TOP. It's Debian, I'm running some MyISAM tables but most are InnoDB. I have verified that the load get's high when the site using InnoDB goes live. Even now at night when the overall load is usually under 1, there is 12. I don't know what went wrong. I haven't changed anything. I already tried to completely restart the machine.
I changed my innodb_log_file_size to 2047M instead 512M for no visible improvement in load. There goes the rest of my.cnf Fine Tuning section:
key_buffer = 4000M
max_allowed_packet = 32M
thread_stack = 256K
thread_cache_size = 192
myisam-recover = BACKUP
max_connections = 10000
query_cache_limit = 1M
query_cache_size = 256M
innodb_buffer_pool_size=8G
innodb_additional_mem_pool_size=20M
sort_buffer=2M
thread_concurrency=16
record_buffer=2M
tmp_table_size=128M
max_heap_table_size=128M
table_cache=1024
innodb_log_file_size = 2047M
innodb_log_buffer_size = 16M
innodb_flush_log_at_trx_commit = 2
innodb_file_per_table
innodb_flush_method = O_DIRECT
It's 24GB RAM machine running both Apache and MySQL. There is around 20k visits per day. At this very moment there is still 20GB RAM free (it's night and I've just rebooted). System HDD is only 8% full. All HDDs seem writing/reading at proper speeds.
Is there something wrong with my configuration? Is it possible the load flipped high even though I haven't changed any server machine settings or PHP scripts? What else could possibly cause this?
EDIT: output from vmstat
vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
13 1 868 280368 179164 18089764 0 0 511 579 16 89 87 2 10 1
24 2 868 348948 183916 18136144 0 0 5791 610 1916 7552 89 2 7 1
23 0 868 501744 185972 18169456 0 0 3995 5877 2401 6277 91 3 5 1
6 0 868 694992 186136 18202684 0 0 3385 4783 1006 5958 91 2 7 0
22 0 868 825240 186372 18243540 0 0 4133 4087 1098 5364 91 2 7 0
19 0 868 284452 186540 18281960 0 0 3907 4380 537 6468 91 3 7 0
44 0 868 123408 177236 17022560 0 0 3896 5173 483 6914 88 5 7 0
17 0 868 159388 173236 16729360 0 0 4625 8856 1433 7072 89 3 8 0
14 0 868 248836 173380 16763992 0 0 5284 698 819 7357 88 2 9 0
15 0 868 406092 173592 16809708 0 0 4730 5794 1148 7224 90 2 8 0
ps -eo command,pid,ppid,%cpu,%mem,rss,pri,psr,size –sort=-%cpu|head -n 20
COMMAND PID PPID %CPU %MEM RSS PRI PSR SZ
/usr/sbin/mysqld --basedir= 379 342 364 11.1 2747020 19 1 14249120
gzip 5537 5536 30.1 0.0 756 0 3 452
/usr/sbin/apache2 -k start 2435 27735 6.6 0.3 75408 19 1 47184
/usr/sbin/apache2 -k start 2388 27735 6.7 0.3 77928 19 3 48144
/usr/sbin/apache2 -k start 2323 27735 5.9 0.3 79660 19 3 48464
/usr/sbin/apache2 -k start 2363 27735 5.8 0.3 77156 19 4 47256
/usr/sbin/apache2 -k start 2418 27735 5.7 0.3 77248 19 7 46684
/usr/sbin/apache2 -k start 2350 27735 5.8 0.3 78504 19 2 48092
/usr/sbin/apache2 -k start 2437 27735 5.5 0.3 75928 19 3 47436
/usr/sbin/apache2 -k start 2439 27735 5.5 0.3 75716 19 3 47596
/usr/sbin/apache2 -k start 2356 27735 5.7 0.3 78560 19 3 48708
/usr/sbin/apache2 -k start 2284 27735 5.6 0.3 79532 19 3 47896
/usr/sbin/apache2 -k start 2349 27735 5.6 0.3 78248 19 7 48548
/usr/sbin/apache2 -k start 2368 27735 5.6 0.3 77100 19 3 45852
/usr/sbin/apache2 -k start 2387 27735 5.5 0.3 79964 19 7 48952
/usr/sbin/apache2 -k start 2383 27735 5.4 0.3 79212 19 1 48448
/usr/sbin/apache2 -k start 2169 27735 5.4 0.3 81740 19 3 48636
/usr/sbin/apache2 -k start 2411 27735 5.3 0.3 77292 19 3 47628
/usr/sbin/apache2 -k start 1779 27735 5.4 0.3 88876 19 0 48384
mpstat 5 10
Linux 2.6.26-2-amd64 12/12/2014 _x86_64_
03:38:23 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
03:38:28 PM all 87.44 3.56 1.44 0.17 0.15 0.40 0.00 6.83 2363.80
03:38:33 PM all 86.82 3.31 1.82 0.12 0.05 0.25 0.00 7.62 1703.00
03:38:38 PM all 88.52 2.39 1.76 0.30 0.15 0.32 0.00 6.56 2552.68
03:38:43 PM all 85.12 3.92 2.17 0.27 0.10 0.42 0.00 7.99 2810.60
03:38:48 PM all 87.72 3.34 1.82 0.10 0.10 0.30 0.00 6.61 2368.00
03:38:53 PM all 85.36 3.83 1.59 0.40 0.05 0.17 0.00 8.59 1589.60
03:38:58 PM all 85.74 4.01 1.50 0.20 0.07 0.17 0.00 8.30 1648.00
03:39:03 PM all 85.26 4.16 1.75 0.20 0.12 0.60 0.00 7.91 1764.20
03:39:08 PM all 87.20 3.62 1.70 0.17 0.10 0.45 0.00 6.76 2221.80
03:39:13 PM all 85.96 3.12 2.67 0.27 0.05 0.52 0.00 7.41 2829.60
Average: all 86.52 3.53 1.82 0.22 0.09 0.36 0.00 7.46 2185.35
EDIT, MySQL 5.5
Upgrade to MySQL 5.5 actually made a big difference. Load dropped from 20 to 10 by day, from 10 to 6 by night. Traffic close to none = load 6, high traffic = load 10.
=> Now it's interesting, current loads are exactly 5 points above the values before the incident. No matter the traffic. This doesn't really make me much sense. There's no reason for load 6 on 8-core CPU almost without traffic and even more weird, that load only doubles while heavy traffic. Day load used to be 5 times higher than night.
==> Either MySQL 5.5 is miraculously efficient (compared to 5.0) OR there is something, that adds up those 5 points no matter what. I can't see any special process though. It has to be Apache/PHP/MySQL related.
EDIT, SOLVED (by itself)
4 days ago, load goes up in the sky, for no apparent reason. Today early morning, load goes down to normal values just like that.
I'm not a fan of blaming DDOS for everything, I actually ruled that out by looking at network statistics. But then again, I never experienced any attack, I might not recognize it.
FACT: After 3 days, I have managed to upgrade to MySQL 5.5 which didn't solve the problem entirely but lowered the load enough to make the site smooth again on visitors end. One day after, problem goes away. It's almost like somebody lost interest when the site wasn't suffering from long load times.
OS corruption wouldn't go away by itself. There were no background tasks. Attack seem like the only left explanation, though I haven't noticed any strange network activity. One thing I can say, our audience is exactly the type of people who often does these kind of things.
EDIT 29/12/2014
I'm not sure if anybody is still watching this thread. I'd just like to post an update. High loads came back after several days, then gone for couple, back etc. Sometimes the load is constantly high, sometimes it spikes. On day it can drop by 50% another rise by 100%. Time differes – there is no connection to actual traffic, background tasks, nothing. Although it really looks like DDOS, there is no strange network activity.
The big difference though is MySQL 5.5, now the machine can handle better whatever is eating its resources, keeping the site up and running. Bottleneck seem to be again the CPU, which doesn't make much sense, again.
The most "funny" is the unpredictability of this behaviour. That's just doesn't really fit any SW/HW fault, or does it ??
Best Answer
Beyond the shadow of any doubt, I strongly believe it is the InnoDB Storage Engine.
It is almost like a living, breathing organism.
Here is a Pictorial Representation from Percona's CTO Vadim Tkachenko
Please note the InnoDB Buffer Pool. If it has lots of dirty pages (changes to write back to the physical tables) and corresponding index changes (Insert Buffer section of the Buffer Pool), note the following writes
What is the point of discussing the InnoDB Storage Engine ?
If you have a lot of dirty pages written throughout the business day, there may be some activity to flush changes around the engine like a heart circulating blood. Even mild writes can hide for while. Since you have a 2G buffer pool, InnoDB could be just scouring the Buffer Pool (via its purge thread).
From the looks of the my.cnf, I would say you probably have MySQL 5.5 or higher. InnoDB, when not tuned, tends to be more aggressive with the purge thread.
Now, for a surprise. Did you know InnoDB can stay quiet with at least 75% of the Buffer still dirty ?
What this means if that InnoDB will not tolerate more than 75% (or 90%) dirty pages. It starts aggressive purge thread activity when dirty pages exceed the threshold of
innodb_max_dirty_pages_pct
. Once it falls below the threshold of 75% or 90%, InnoDB will quitely flush dirty pages steadily WHEN IT IS GOOD AND READY. That manifests itself as write I/O, server load, and increased CPU even when there are no INSERTs, UPDATEs, and DELETEs going on. Notwithstanding, InnoDB will decide when really old dirty pages need to be purged.What you may need is just some tuning. Here is a sample of some settings you need
For a more in-depth look at having InnoDB engage more hyperthreading and CPU, please see my DBA StackExchange posts:
BTW, you have 24GB of RAM and only 2GB Buffer Pool? Please see my old post How large should be mysql innodb_buffer_pool_size?. Why? An undersized Buffer Pool, even with proper tuning for hyperthreading and CPU, can still possibly have intermittent writes out of nowhere.
UPDATE 2014-12-12 10:30 EST
Since you are at MySQL 5.0 right now, you could probably make all flushing as thorough as possible with the following settings:
That should minimize, or at least reduce, occurrences of sudden rise in Load and CPU
These are optioinal
Why These Optional Settings ?
Long term, you need really need to go with MySQL 5.6.