I have the following situation:
About 5 times a week (not related to any specific situation like cache clear, traffic spike) some queries are stuck on sending data (show processlist
):
> SELECT `main_table`.`entity_id`, `main_table`.`level`, `main_table`.`path`, `main_table`.`position`,
> `main_table`.`is_active`, `main_table`.`is_anchor`,
> `main_table`.`name`, `url_rewrite`.`request_path` FROM
> `catalog_category_flat_store_30` AS `main_table`
> LEFT JOIN `core_url_rewrite` AS `url_rewrite` ON url_rewrite.category_id=main_table.entity_id AND
> url_rewrite.is_system=1 AND url_rewrite.product_id IS NULL AND
> url_rewrite.store_id='30' AND url_rewrite.id_path LIKE 'category/%'
> WHERE (path LIKE '1/2/%') AND (main_table.store_id = '30') AND
> (is_active = '1') AND (include_in_menu = '1') ORDER BY name ASC
second one:
> SELECT `main_table`.`entity_id`, main_table.`name`, main_table.`path`,
> `main_table`.`is_active`, `main_table`.`is_anchor`,
> `main_table`.`manually`, `url_rewrite`.`request_path` FROM
> `catalog_category_flat_store_10` AS `main_table` LEFT JOIN
> `core_url_rewrite` AS `url_rewrite` ON
> url_rewrite.category_id=main_table.entity_id AND
> url_rewrite.is_system=1 AND url_rewrite.product_id IS NULL AND
> url_rewrite.store_id='10' AND url_rewrite.id_path LIKE 'category/%'
> WHERE (main_table.is_active = '1') AND (main_table.include_in_menu =
> '1') AND (main_table.path like '1/2/1528/1569/%') AND (`level` <= 4)
> ORDER BY `main_table`.`position` ASC
These queries are related to generating the navigation menu. They run without any issues and very fast all the time.
Few times a month some other queries get stuck on seding data or waiting for table lock:
INSERT INTO `catalogsearch_result` SELECT 316598 AS `query_id`, `s`.`product_id`, MATCH (s.data_index) AGAINST ('STRING HERE' IN BOOLEAN MODE) AS `relevance` FROM `catalogsearch_fulltext` AS `s`
INNER JOIN `catalog_product_entity` AS `e` ON e.entity_id = s.product_id WHERE (s.store_id = 38) AND (MATCH (s.data_index) AGAINST ('STRING HERE' IN BOOLEAN MODE)) ON DUPLICATE KEY UPDATE `relevance` = VALUES(`relevance`)
(search related)
Additional info:
- core_url_rewrite – 3M records (30 websites, 100k products)
- catalog_category_flat_store_* – 2000 records (use flat categories is enabled)
This is running on a setup using vmware on some huge hardware (mysql master has 8 cores allocated and 64Gb of RAM, SSD disks on a SAN storage), mysql was optimized and is monitored continuously.
There were some issues in the past related to I/O (some isssue with the link between the servers and the SAN storage).
We couldn't pinpoint the issue because running on bare metal (no virtualization, same config) this doesn't happen ever, in high stress conditions (running siege + load testing scenarios, no cache).
Anyone else having similar issues?
UPDATE:
reindexAll search was moved to a temporary table (so it doesn't lock the main table used by the production, then renames the tmp table). So, the reindex process doesn't interfere with visitors searching the website.
https://github.com/magendooro/magento-fulltext-reindex kudos to carco
Best Answer
It looks like a core bug/regression we saw in 1.7 where the block and collection cache weren't working effectively for the navigation menu (
catalog/navigation/top.phtml
).You can test by removing it, or just temporarily capture the output into a file with an
ob_start
and serve it from a static file/memcache.Also, the hardware you are using doesn't sound huge and looks under specified for the size of the store you have. There is probably an I/O bottleneck there too - SAN storage + congested network = poor performance.
--
As a crude solution, you can adjust the block class for the navigation (dump
get_class($this)
) intop.phtml
to identify it.This will allow site-wide caching, without the category level caching that the new version invoked. Its also worth removing the
is_active
class from the tree renderer if you do this to avoid random menu items appearing selected (and implement a JS alternative instead).