Please look carefully at the processlist and the 'show engine innodb status'. What do you see ???
Process IDs 1,2,4,5,6,13 are all trying to run COMMIT.
Who is holding up everything ??? Process ID 40 is running a query against large_table.
Process ID 40 has been running for 33 seconds. Process IDs 1,2,4,5,6,13 having been running less than 33 seconds. Process ID 40 is processing something. What's the hold up ???
First of all, the query is pounding on large_table's clustered index via MVCC.
Within Process IDs 1,2,4,5,6,13 are rows that have MVCC Data protecting its transaction isolation. Process ID 40 has a query that is marching through rows of data. If there is an index on the field hotspot_id, that key + the key to the actual row from the clustered index must perform an internal lock. (Note: By design, all non-unique indexes in InnoDB carry both your key (the column you meant to index) + a clustered index key). This unique scenario is essentially Unstoppable Force meets Immovable Object.
In essence, the COMMITs must wait until it is safe to apply changes against large_table. Your situation is not unique, not a one-off, not a rare phenomenon.
I actually answered three questions like this in the DBA StackExchange. The questions were submitted by the same person related to the same one problem. My answers were not the solution but helped the question submitter come to his own conclusion on how to handle his situation.
In addition to those answers, I answered another person's question about deadlocks in InnoDB with regard to SELECTs.
I hope my past posts on this subject help clarify what was happening to you.
UPDATE 2011-08-25 08:10 EDT
Here is the query from Process ID 40
SELECT * FROM `large_table`
WHERE (`large_table`.`hotspot_id` = 3000064)
ORDER BY discovered_at LIMIT 799000, 1000;
Two observations:
You are doing 'SELECT *' do you need to fetch every column ? If you need only specific columns, you should label them because the temp table of 1000 rows could be larger than you really need.
The WHERE and ORDER BY clauses usually give away performance issues or make table design shine. You need to create a mechanism that will speed up the gather of keys before gathering data.
In light of these two observations, there are two major changes you must make:
MAJOR CHANGE #1 : Refactor the query
Redesign the query so that
- keys are gathered from the index
- only 1000 or them are collect
- joined back to the main table
Here is the new query which does these three things
SELECT large_table.* FROM
large_table INNER JOIN
(
SELECT hotspot_id,discovered_at
FROM large_table
WHERE hotspot_id = 3000064
ORDER BY discovered_at
LIMIT 799000,1000
) large_table_keys
USING (hotspot_id,discovered_at);
The subquery large_table_keys gathers the 1000 keys you need. The result from the subquery is then INNER JOINed to large_table. So far, the keys are retrieved instead of whole rows. That's still 799,000 rows to read through. There is a better way to get those keys, which leads us to...
MAJOR CHANGE #2 : Create Indexes that Support the Refactored Query
Since the refactored query only features one subquery, you only need to make one index. Here is that index:
ALTER TABLE large_table ADD INDEX hotspot_discovered_ndx (hotspot_id,discovered_at);
Why this particular index ? Look at the WHERE clause. The hotspot_id is a static value. This makes all hotspot_ids form a sequential list in the index. Now, look at the ORDER BY clause. The discovered_at column is probably a DATETIME or TIMESTAMP field.
The natural order this presents in the index is as follows:
- Index features a list of hostpot_ids
- Each hotspot_id has an ordered list of discovered_at fields
Making this index also eliminates doing internal sorting of temp tables.
Please put these two major changes in place and you will see a difference in running time.
Give it a Try !!!
UPDATE 2011-08-25 08:15 EDT
I looked at your indexes. You still need to create the index I suggested.
Best Answer
Well, do note that if I recall well (it's been a while since I did DB work) COUNT(*) queries without a WHERE clause on innodb tables are notoriously slower than on MyISAM and Memory tables.
Also, is this by any chance a Xen DomU?
What is the frontend language? If PHP, is it using MySQL or MySQLi? Are they using persistent connections?
You have not mentionned the underlying operating system, but in the case of Linux I would start by staring at the output of
free -m
, paying special attention to the last two lines to see if memory is tight overall.Here we have a system that's healthy (it is my workstation). The second column excludes buffers and cache, so I am in fact using 2177mb of memory, and have 1784 megabytes readily available.
Last line shows that I don't use swap at all so far.
Then giving
vmstat(8)
, to see if your system is trashing like mad would be useful, too.(My desktop really isn't doing all that much here, sorry. What a waste of 8 perfectly good cores)
If you see a lot of process spending time in the 'b' column, that means they are blocked, waiting for something. Often that is IO. The important columns here are
si
andso
. Check if they're populated with high values. If so, this may be your problem -- something is consuming a lot of memory, more than you can actually affort. Usingtop(4)
and ordering the columns by memory % (shift+m while in top) might show the culprit(s).It's not impossible that your system is trashing to and from swap, and saturating the disks, causing blocked threads and processes.The tool
iostat(8)
(part of packagesysstat
, usually) should be given a whirl to see if you have processes that are blocked, stuck on IO_WAIT. A saturated disk can spell bad news for the whole system under high load, especially if the system is swapping a lot.You might run iostat with extended stats, every five seconds, for instance:
This should allow you to easily see if your volumes are being saturated. For instance here, you can see that my disks are terribly underutilized, that the system spends most of its cpu cycles idling, etc etc. If that percentage is mostly in the % IOWAIT column, well you have an IO bottleneck here. You probably already know all this, but just covering all the bases to make sure.
The idea is that your config file changed, and you have no history of it (putting your config files under version control is a great idea for that very reason) -- and it is not impossible the size of a buffer suddendly changed thus making expensive queries like COUNT(*) without SELECT suddendly start to gobble up ressources.
Based on what you have learned from the previous use of the tools abive -- you should probably inspect the configuration file (being the only thing that changed, it is very likely the culprit) to see if the buffer values are sane for your average load.
How large are the buffers, like the
query_cache_size
value, and especially thesort_buffer
sizes? (If that doesn't fit in memory, it will performed on disk, at a massive cost as I'm sure you can imagine).How large is the
innodb_buffer_pool_size
?How large is the
table_cache
and most importantly, does that value fits within the system limits for file handles? (both open-files-limit in [mysqld] and at the OS level).Also, I don't remember off the top of my head if this is still true, but I'm fairly certain that innodb actually locks the entire table whenever it has to commit an auto-increment fields. I googled and I could not find if that was still true or not.
You could also use
innotop(1)
to see what's going on more in detail, too.I hope this helps somehow or gives you a starting point :)