MySQL thread get stuck in end-state

MySQL

One of our replication master production servers is showing some really odd behavior for which I can't seem to find a solution.

Some threads on this server get stuck in state 'end'. This happens purely random, but when this occurs the thread is always updating or inserting rows in a table. The tables on which the query is running differs but is always on a MyISAM tables and in a range of three different tables.

When a thread goes in end-state all other threads get stucked with status locked. And when I say all threads I mean all, even threads that are not querying the same database or table.

The web-servers keep queuing queries to the database server without getting a response. This ultimately causes the web-servers to run out of sockets. At that moment all request to the domains get denied. The database servers shows no I/O or processor activity during the time the thread is in 'end' state. When this problem occurs I have to kill the thread manually. Even that does not do anything other that it's command-status changes to 'killed' . Most of the threads disappear after about 100 seconds.

The tables on which the threads are running queries when they go in end-state vary in size but are around 20 to 100 MB. At the moment that this problems occurs these tables are frequently updated but not in extreme ways. I think the updates range to 3 to 10 per second.

Some specifications about the server. The OS is CentOS 5.4 with MySQL 5.0.77-log. The processor is a AMD Opteron 2378, the harddrives are a RAID 1+0 array of Corsair X32 32GB SSD's.

I am thinking that the SSD's might be a part of the cause of the problem but I can't find any data to confirm that. The drives have performed quite stable for a while.

I have read the documentation on the MySQL reference guide about General Thread States which says that during the end-state the binary log and query cache gets updated. Maybe this has something to do with the cause of the problem? I would not no which configuration directives could give a working solution.

I have not tried to disable query cache, and I am not able to disable replication as this is a running production server. The fact that this is a running production server causes me to be careful when changing parameters such as the query cache settings unless I know for sure that this is going to solve the problem.

I have not been able to reproduce the problem with some of my test-scripts. When reading, writing and updating the tables that cause the problems heavy the problem does not occur. The occurrence of this problem is purely random.

Best Answer

After further investigation it seemed that it was the query cache that caused the problem. Due to a large query cache MySQL seems to get confused when cleaning the cache. Reducing the query cache from a few GB's to about 512 MB solved the problem. For detailed information see: http://bugs.mysql.com/bug.php?id=39091