Currently performing software archeology on an old-ish installation of MySQL 5.6. I have a few months of MySQL general query log, and would like to do a statistics on what queries are being run by whom how often, as various "undocumented" automata from all over have been set up to access said database. Do I have to go all Perl on this or is there a ready-made analysis tool?
MySQL general query log analysis
MySQL
Related Solutions
innodb_buffer_pool_size > 1G
, because your dataset size is 1,8GB.
To decrease number of reads you need to increase innodb_buffer_pool_size
. To decrease number of writes you need to edit your zabbix templates (disabling some unnecessary items like free inodes, increase intervals between checks).
You have Reads / Writes ratio at 57% / 43%, so enabling Query Cache will not help (it may make things worse, because writes to tables invalidates cache).
Think about increasing tmp_table_size and max_heap_table_size to avoid creating tmp tables on disk (13% of tmp tables). Temporary tables are in MB? it is count? If it is counter, its too high.
Decrease number of connection to 50 (your highest number was 33).
innodb_support_xa = false
innodb_buffer_pool_size = 256M # It depends how many memory is available to MySQL, more is better.
innodb_flush_log_at_trx_commit = 0 # disable writing to logs on every commit and disable fsync on each write
innodb_max_dirty_pages_pct = 90 # avoid flushing dirty pages to disk
innodb_flush_method = O_DIRECT # direct access to disk without OS cache
thread_cache_size = 4
query_cache_size = 0
table_cache = 80? # a little more than number_of_tables_in_zabbix_database
Usefull link about InnoDB optimization.
Please look carefully at the processlist and the 'show engine innodb status'. What do you see ???
Process IDs 1,2,4,5,6,13 are all trying to run COMMIT.
Who is holding up everything ??? Process ID 40 is running a query against large_table.
Process ID 40 has been running for 33 seconds. Process IDs 1,2,4,5,6,13 having been running less than 33 seconds. Process ID 40 is processing something. What's the hold up ???
First of all, the query is pounding on large_table's clustered index via MVCC.
Within Process IDs 1,2,4,5,6,13 are rows that have MVCC Data protecting its transaction isolation. Process ID 40 has a query that is marching through rows of data. If there is an index on the field hotspot_id, that key + the key to the actual row from the clustered index must perform an internal lock. (Note: By design, all non-unique indexes in InnoDB carry both your key (the column you meant to index) + a clustered index key). This unique scenario is essentially Unstoppable Force meets Immovable Object.
In essence, the COMMITs must wait until it is safe to apply changes against large_table. Your situation is not unique, not a one-off, not a rare phenomenon.
I actually answered three questions like this in the DBA StackExchange. The questions were submitted by the same person related to the same one problem. My answers were not the solution but helped the question submitter come to his own conclusion on how to handle his situation.
- Will these two queries result in a deadlock if executed in sequence?
- Trouble deciphering a deadlock in an innodb status log
- Reasons for occasionally slow queries?
In addition to those answers, I answered another person's question about deadlocks in InnoDB with regard to SELECTs.
I hope my past posts on this subject help clarify what was happening to you.
UPDATE 2011-08-25 08:10 EDT
Here is the query from Process ID 40
SELECT * FROM `large_table`
WHERE (`large_table`.`hotspot_id` = 3000064)
ORDER BY discovered_at LIMIT 799000, 1000;
Two observations:
You are doing 'SELECT *' do you need to fetch every column ? If you need only specific columns, you should label them because the temp table of 1000 rows could be larger than you really need.
The WHERE and ORDER BY clauses usually give away performance issues or make table design shine. You need to create a mechanism that will speed up the gather of keys before gathering data.
In light of these two observations, there are two major changes you must make:
MAJOR CHANGE #1 : Refactor the query
Redesign the query so that
- keys are gathered from the index
- only 1000 or them are collect
- joined back to the main table
Here is the new query which does these three things
SELECT large_table.* FROM
large_table INNER JOIN
(
SELECT hotspot_id,discovered_at
FROM large_table
WHERE hotspot_id = 3000064
ORDER BY discovered_at
LIMIT 799000,1000
) large_table_keys
USING (hotspot_id,discovered_at);
The subquery large_table_keys gathers the 1000 keys you need. The result from the subquery is then INNER JOINed to large_table. So far, the keys are retrieved instead of whole rows. That's still 799,000 rows to read through. There is a better way to get those keys, which leads us to...
MAJOR CHANGE #2 : Create Indexes that Support the Refactored Query
Since the refactored query only features one subquery, you only need to make one index. Here is that index:
ALTER TABLE large_table ADD INDEX hotspot_discovered_ndx (hotspot_id,discovered_at);
Why this particular index ? Look at the WHERE clause. The hotspot_id is a static value. This makes all hotspot_ids form a sequential list in the index. Now, look at the ORDER BY clause. The discovered_at column is probably a DATETIME or TIMESTAMP field.
The natural order this presents in the index is as follows:
- Index features a list of hostpot_ids
- Each hotspot_id has an ordered list of discovered_at fields
Making this index also eliminates doing internal sorting of temp tables.
Please put these two major changes in place and you will see a difference in running time.
Give it a Try !!!
UPDATE 2011-08-25 08:15 EDT
I looked at your indexes. You still need to create the index I suggested.
Best Answer
In the end I wrote a Perl script to do this. Not perfect but it does its job.
Available at GitHub mysql56_query_log_analysis.pl
Note that the MySQL 5.6 general query log has a different format than the MySQL 5.7 query log, so another script will be needed for 5.7.