MySQL Server hitting 100% unexpectedly (Amazon AWS RDS)

amazon-web-servicesMySQLPHPrds

enter image description here

Please help! We've been struggling with this one for months. This week we upped our RDS instance to the highest performing instance and although the occurrences have reduced, we're still having our DB all of a sudden hit 100%. It comes out of nowhere. Sometimes 2am, sometimes midday.

I've ruled out a DOS – our pages access logs have normal traffic

I've ruled out memcached suddenly dieing (hits and misses continue as normal).

The SHOW PROCESSLIST while we have issues reports about 500 queries in queue. If I kill them off or restart the server, they just keep coming back and then eventually out of knowhere, our server resumes back to normal. Sometimes up to 3 hours.

Our bad performing queries take .02 seconds to execute when the server eventually returns back to normal but while we're in this 100% CPU physco phase, those queries never finish executing.

Please help!!!!! Anybody know anything about MYSQL query optimization? Could it be the server deciding to use different indexes all of a sudden, which puts it into a spiral?

Best Answer

http://www.mysqlperformanceblog.com/2010/09/10/cache-miss-storm/

As it turns out, our issue was a cache storm miss, AKA miss stampede.

We've solved it by implementing a 50% cache expiry. Basically for every item in memcache we create a second cache item with a similar key plus the appended 'regenerate' string. This item expires at exactly 50% of the typical cache expiry time, indicating to the next request that we're getting closer to expiring cache and the next request will need to try and regenerate cache.

This prevents the storm of users trying to regenerate cache at the same time and ensures our cache has the best chance of always staying fresh!

A Tricky one to track down!

Related Solutions

MySQL connection spike out of nowhere

My guess is that you have some long running queries in your application. When they are executed they cause the connection to stay checked out of the pool for a long time (relative to the usual usage pattern), this causes your pool to become exhausted, grow, and continue to grow up to its maximum, at which point any remaining workers block waiting on connections to be released.

The first thing will be to track down when this happens, that is, is it a cyclical event, or random. If its the former you're in luck, as you can be ready time it happens. If you can't determine a pattern then you'll have to be vigilant.

You may be able to figure this from looking at your website monitoring logs, or sar from your database to see if there are any correlating spikes.

If you can catch your database when its under load, you should execute the following commands on the mysql server

show innodb status;
show processlist;

The former will print out diagnostic information about the innodb engine (you are using innodb right?), the latter will print out the first few hundred chars of the query that was executing. Look for queries that have been running for a long time, queries generating temporary tables on disk, and queries that are blocked on a resource.

After that, the hard work begins. Use EXPLAIN to estimate the cost of the query, and the resources it uses. Avoid queries that require sorting on disk via a tmp table. Look for long running reporting jobs, or other scheduled maintenance tasks that periodically lock or saturate your database. It could be something as simple as the backup task, or a job that rolls up old purchase order data.

I recommend having these three settings in your /etc/my.cnf

log_slow_queries
log-queries-not-using-indexes
set-variable = long_query_time=1

For a web application doing 20-30 requests per second, you can't afford to have anything show up in these logs.

btw, IMHO its pointless to increase your connection pool's size beyond your original size as this will only delay the onset of pool exhaustion by at best, a few seconds, and only put more pressure on your db right when it doesn't need it.

aws-rds – Understanding Connection Limits

AWS RDS max_connections limit variable is based on Instance type, so you can upgrade your RDS or make more replica.

The RDS types with max_connections limit:

t2.micro 66
t2.small 150
m3.medium 296
t2.medium 312
m3.large 609
t2.large 648
m4.large 648
m3.xlarge 1237
r3.large 1258
m4.xlarge 1320
m2.xlarge 1412
m3.2xlarge 2492
r3.xlarge 2540

Referring by max_connections at AWS RDS MySQL Instance Sizes in 2015

Update 2017-07

The current RDS MySQL max_connections setting is default by {DBInstanceClassMemory/12582880}, if you use t2.micro with 512MB RAM, the max_connections could be (512*1024*1024)/12582880 ~= 40, and so on.

Each Web server could have many connections to RDS, which depends on your SQL requests from Web server.

Best Answer

Related Solutions

MySQL connection spike out of nowhere

aws-rds – Understanding Connection Limits

Update 2017-07

Related Topic