Linux – thesql stops responding periodically

linuxMySQLUbuntuubuntu-14.04vmware-server

I run a site (Magento) on a Ubuntu 14.04.3 server running on a 32xCPU VMWare VPS.

When it is under heavy load, it typically receives 20-25 requests/second. In magento there is a specific UPDATE query to a mysql table which normally takes ~1 ms (±0.2 ms) and runs ~200-300 times per minute (3-5 queries/second). However, during these heavy loads at intervals of 1-2 hours, this specific query suddenly takes 5-35 seconds to finish, which also stalls the entire website (even requests without this query).

I have monitored ram and cpu utilization and the load typically hovers around 22-28, both before the freeze and during. The freeze seems to be nearly permanent. It can last for at least 40 minutes and restarting mysql and php-fpm does not make it go away. RAM usage never goes beyond 10 % av available RAM and swap is never used.

The only way I have to solve it is by rebooting the VPS, which makes me believe there is a underlying system misconfiguration responsible for the freeze.

An interesting note, though: A few times the issue has solved itself without a reboot. What these cases have in common is that this query "only" takes 2-7 seconds to finish. At these times, the problem goes away in 10-15 minutes times.

So, any suggestions on what causes this and how I can track down the real underlying issue?

Update 1: The system load (1 minute load for 32 CPU cores) typically peaks at 27-28, but can rise to as high as 40 under extreme load. When this freeze occurs, the load is typically 22-27 both before and during the freeze. Most if not all available CPU cores (32) have some idle time during the freeze.

Update 2:
I have made these changes to my.cnf:

innodb_buffer_pool_size = 10G (Innodb data is 5.5G)
key_buffer          = 16M
max_allowed_packet  = 16M
thread_stack        = 192K
thread_cache_size   = 8
max_connections     = 1024

Best Answer

Have you monitored Disk I/O? Is there an increase in I/O wait times or queued transactions? It's possible that requests are queueing up at the storage level due to an I/O limitation put on by your host. Also, have you checked if you're hitting your max allowable mysql clients? If these queries are suddenly taking a lot longer to complete, it's also possible that it's not leaving enough available connections for normal site traffic because the other connections aren't closing fast enough.