Mysql – How to figure out what causes random jumps in load average

monitoringMySQLperformance

I'm having an issue with load averages on my Debian GNU/Linux dedicated boxes. They are both (2 of them) running MySQL + a custom game server software – a smallish "MMORPG" (not massive at all). The CPU usage and memory usages are OK. CPU usage is usually < 5%. RAM usage does go up to something like 80 – 90% but there is always a bunch free, cached or buffered. Swap usage is 0.

While monitoring loads with uptime, top, or any other command that shows it, it randomly jumps up to something like 4 or even more. This is obviously an issue, especially considering both boxes "only" have 2 cores. After the magic jump of load average, it starts going down smoothly, suggesting it was a really temporary jump in resource usage. CPU usage is always 0-5%, at max 10%, whenever I've managed to stare at top at 1 second update frequency for about 15 minutes.

I've tried some tools like htop, vmstat, dstat etc. to no avail. Here is a log for anyone who is interested:

http://www.k-zodron.com/log.txt

Apart from the occurence in the beginning lines, the CPU usage barely goes up while the load jumps to astronomic values. I'm no super-expert with this stuff, but writing 4KB to a disc doesn't sound like it could be a I/O bottleneck either.

I have also ran MySQL Tuning Primer Tool and it reports everything being fine.

Any ideas how to trace and solve the problem? Thanks!

Edit

http://www.k-zodron.com/munin/

Munin stats, updating about every 5 – 10 minutes.

Best Answer

can it be that mysql is using temporary tables? can you add some munin charts to with io stat..? io figures in provided log seems unbelievably low.

what is your working set - does the data fits comfortably in the memory [seems so]? do you do from time to time plenty of writes to sql [from log it seems - none at all]?

is it possible that suddenly you have spike in number of concurrent requests [ sql or to your custom server ]? what cat /proc/net/ip_conntrack|wc -l says? what does it show during load spikes?

can you turn on mysql slow query logging - eg everything > 1 or 2 sec?

are your disks directly attached to the server or maybe it's iscsi / nfs? can you check health status of disks [smart] / raid status? maybe one of drives is failing... or maybe you can run simple io disk benchmark off-peak to make sure you get decent read/write speeds.

or maybe something ugly shows in dmesg?

edit: check if netstat |wc -l is correlated with load

check if ps axms|wc -l is correlated with load

check if lsof |wc -l is correlated with load

[ preferably hack small munin plugins for those to get them on the charts ].