Magento 1.9 – MySQL Update of sales_flat_quote Suddenly Slow

magento-1.9MySQLperformance

I have a problem with a magento 1.9 installation behind a varnish cache. The load average is rougly 500 request/minute (including ajax requests and esi callbacks).

Normally, I get ~200-250 UPDATE queries per minute for sales_flat_quote. Under normal circumstances, these only takes about 1 ms to finish. However, I have a problem where all of these updates at times takes 5-30 seconds to finish.

During these spikes, all page loads grinds to a halt, but the overall system load is lowered (from ~20 to ~12), as is cpu usage. Mysql never uses more than 70-80 % of a single CPU during these spikes (with 32 CPU cores available), and there are plenty of available RAM, with innodb_buffer_pool = 10G but only 6G used. The number of UPDATE queries are pretty much unaffected. In fact, they tend to be slightly lower at ~200 queries/second.

Restarting mysql or php-fpm doesn't solve this problem. The only way I am able to solve it is by rebooting the entire server, which is not an acceptable solution.

Furthermore, the sales_flat_quote table is routinely truncated and only contains ~ 5000 rows.

The source of these updates all comes from the cart and the checkout.

Best Answer

The only way I am able to solve it is by rebooting the entire server

Fire your hosting provider, immediately. The only time a server should be powered off is to physically upgrade it, or install a new kernel - never to solve a problem. If they can't provide a solution to such a simple issue, they aren't a company worth using.


Your issue is fairly clear, you are lacking instrumentation.

You should be graphing every single application and event on your server in initimate detail, so that you can correlate the sequence of events leading up to a slow query incident to identify a cause.

If everything performs normally, then suddenly slows down, I'd be inclined to review,

  • Varnish hit rate - does it plummet at the time of the slowdown?
  • Inbound traffic rate - do you see an increase in traffic, do you see an increase from a single IP?
  • Cache store utilisation - has someone flushed a cache, or have several entries expired simultaneously?
  • Hardware activity - what else is happening on the server at the time, is I/O high, SYS CPU etc. Is it cloud/VPS - if so, what is the hypervisor doing?

There could be dozens of possible causes, all of which should take seconds to identify and fix, but without instrumentation - you can't simply guess your way through a fix. You'll just cause further problems blindly making adjustments.

Related Topic