Magento 1.8 – Resolve Inconsistent Product Attribute Index with Duplicate Entries

attributeseavindexingmagento-1.8

I have a strange inconsistency in the attribute index that could not be solved with a full reindex. For a dropdown attribute there are two different values for each store, like this:

mysql>  select * from catalog_product_index_eav where entity_id=10515 and attribute_id=274;                                                                                      +-----------+--------------+----------+-------+
| entity_id | attribute_id | store_id | value |
+-----------+--------------+----------+-------+
|     10515 |          274 |       12 | 99437 |
|     10515 |          274 |       12 | 99441 |
|     10515 |          274 |       13 | 99437 |
|     10515 |          274 |       13 | 99441 |
|     10515 |          274 |       14 | 99437 |
|     10515 |          274 |       14 | 99441 |
|     10515 |          274 |       15 | 99437 |
|     10515 |          274 |       15 | 99441 |
|     10515 |          274 |       16 | 99437 |
|     10515 |          274 |       16 | 99441 |
|     10515 |          274 |       17 | 99437 |
|     10515 |          274 |       17 | 99441 |
|     10515 |          274 |       18 | 99437 |
|     10515 |          274 |       18 | 99441 |
|     10515 |          274 |       19 | 99437 |
|     10515 |          274 |       19 | 99441 |
|     10515 |          274 |       20 | 99437 |
|     10515 |          274 |       20 | 99441 |
|     10515 |          274 |       21 | 99437 |
|     10515 |          274 |       21 | 99441 |
|     10515 |          274 |       22 | 99437 |
|     10515 |          274 |       22 | 99441 |
+-----------+--------------+----------+-------+
22 rows in set (0.00 sec)

(same result with catalog_product_index_eav_idx) The attribute has global scope, so there is only one value in the EAV table:

mysql>  select * from catalog_product_entity_int where entity_id=10515 and attribute_id=274;
+----------+----------------+--------------+----------+-----------+-------+
| value_id | entity_type_id | attribute_id | store_id | entity_id | value |
+----------+----------------+--------------+----------+-----------+-------+
|   260901 |              4 |          274 |        0 |     10515 | 99441 |
+----------+----------------+--------------+----------+-----------+-------+
1 row in set (0.00 sec)

The indexer should read that with a query like this (based on the code in Mage_Catalog_Model_Resource_Product_Indexer_Eav_Source):

mysql> select pid.entity_id,pid.attribute_id,pid.store_id,IFNULL(pis.value,pid.value) as value from (select d.entity_id,d.attribute_id,d.value,s.store_id,s.website_id from core_store s left join catalog_product_entity_int d on 1=1 and d.store_id=0 where s.store_id != 0 and d.entity_id=10515) pid left join catalog_product_entity_int pis on pis.entity_id = pid.entity_id and pis.attribute_id=pid.attribute_id and pis.store_id=pid.store_id where pid.attribute_id=274;
+-----------+--------------+----------+-------+
| entity_id | attribute_id | store_id | value |
+-----------+--------------+----------+-------+
|     10515 |          274 |       12 | 99441 |
|     10515 |          274 |       16 | 99441 |
|     10515 |          274 |       17 | 99441 |
|     10515 |          274 |       18 | 99441 |
|     10515 |          274 |       19 | 99441 |
|     10515 |          274 |       20 | 99441 |
|     10515 |          274 |       21 | 99441 |
|     10515 |          274 |       22 | 99441 |
|     10515 |          274 |       15 | 99441 |
|     10515 |          274 |       13 | 99441 |
|     10515 |          274 |       14 | 99441 |
+-----------+--------------+----------+-------+
11 rows in set (0.00 sec)

As far as I understand the indexer workflow, it creates the index in a temporary table, then (in a transaction) deletes everything from the existing index table and copies the data from the temporary table. I.e. after a full reindex it should not be possible that old index data is still around.

So where can this other value come from and how can I possibly get rid of it?

Best Answer

I found the solution in the query logs and it's surprisingly logical.

Short Answer

The product in question was a bundle product and for bundle products all attributes of the bundle selections are taken into account as well. This way you also find bundles that may contain a product with the specified options in layered navigation.

Long Answer

This query runs after the first round of indexing:

INSERT IGNORE INTO `catalog_product_index_eav_idx` SELECT `l`.`parent_id`, `i`.`attribute_id`, `i`.`store_id`, `i`.`value` FROM `catalog_product_relation` AS `l`
CROSS JOIN `core_store` AS `cs`
INNER JOIN `catalog_product_index_eav_idx` AS `i` ON l.child_id = i.entity_id AND cs.store_id = i.store_id GROUP BY `l`.`parent_id`,
`i`.`attribute_id`,
`i`.`store_id`,
`i`.`value`;

If I run the SELECT subquery and limit it to the attribute and product from above, I get the following, which are exactly my mysterious values:

mysql> SELECT `l`.`parent_id`, `i`.`attribute_id`, `i`.`store_id`, `i`.`value` FROM `catalog_product_relation` AS `l`
    ->  CROSS JOIN `core_store` AS `cs`
    ->  INNER JOIN `catalog_product_index_eav_idx` AS `i` ON l.child_id = i.entity_id AND cs.store_id = i.store_id GROUP BY `l`.`parent_id`,
    ->         `i`.`attribute_id`,
    ->         `i`.`store_id`,
    ->         `i`.`value` having l.parent_id=10515 and i.attribute_id=274;
+-----------+--------------+----------+-------+
| parent_id | attribute_id | store_id | value |
+-----------+--------------+----------+-------+
|     10515 |          274 |       12 | 99437 |
|     10515 |          274 |       13 | 99437 |
|     10515 |          274 |       14 | 99437 |
|     10515 |          274 |       15 | 99437 |
|     10515 |          274 |       16 | 99437 |
|     10515 |          274 |       17 | 99437 |
|     10515 |          274 |       18 | 99437 |
|     10515 |          274 |       19 | 99437 |
|     10515 |          274 |       20 | 99437 |
|     10515 |          274 |       21 | 99437 |
|     10515 |          274 |       22 | 99437 |
+-----------+--------------+----------+-------+
11 rows in set (0.45 sec)

The code for this query is in Mage_Catalog_Model_Resource_Product_Indexer_Eav_Abstract::_prepareRelationIndex()

catalog_product_relation contains the bundle relations, so the value comes from a bundle selection of product 10515.