Magento – Magento CE with 1M Products – dealing with Indexing

indexingmagento-ce

I'm operating a reseller website and have combined multiple suppliers which has resulted in a large catalog of a little over 1M product (500k+ products with 2 store views).

I've done all the usual performance tweaks, purchased a dedicated server and the site performs well with the exception of the indexes – they become invalidated daily due to inventory updates, price updates and the addition of new products that come into the feeds.

Regular indexing takes hours upon hours and pricing is not being updated properly resulting in selling a product for a loss.

I'm looking for a permanent solution to deal with this, I know it was addressed in EE 1.13

The introduction of incremental indexing reduces the need to perform a full re-index and most indexing operations are now automated – saving you and your staff time and energy to focus on revenue-generating activities. – See more at: http://magento.com/blog/magento-news/introducing-magento-enterprise-edition-113#sthash.v6XslKCZ.dpuf

@philwinkle mentioned the following here:

Product, URL Rewrite, and Inventory Indexing are the major issues when running a catalog this size.

There is 1 CE Edition module that I'm looking at (mirasvit Fast Asynchronous Re-indexing) but not sure if its the right way to go.

How are companies with large catalogs dealing with indexing in Magento CE? How can this problem be addressed in CE as it was in EE?

Best Answer

So the most problematic index in terms of performance is catalog_url. I have had a similar experience on some larger shops (although not as large as 1M products). The major problem was that the number of records in the core_url_rewrite table was much larger than (#products + #categories) x #stores.

I have found two causes for this, and found a solution/workaround for them.

Hope this helps!

Useless rewrites for products in category paths

Magento creates product rewrites for each category that it is associated with even when the setting catalog/seo/product_use_categories is set to false.

For example, when a product with url 'product-123' is assigned to category 'cat-1/cat-2/cat-3', the following rewrites are created for each store view (taking the default .phtml url suffix):

  • /product-123.html
  • /cat-1/cat-2/cat-3/product-123.html
  • /cat-1/cat-2/product-123.html, when category cat-2 has been configured as a layer navigation category.
  • /cat-1/product-123.html, when category cat-1 has been configured as a layer navigation category.

This can be circumvented by rewriting the method Mage_Catalog_Model_Url::_refreshProductRewrite:

/**
 * Refresh product rewrite, but only for non-root categories
 * when the admin setting "Use Categories Path for Product URLs" is enabled
 */
protected function _refreshProductRewrite(Varien_Object $product, Varien_Object $category)
{
    $categoryStoreId = $category->getStoreId();
    if (Mage::getStoreConfigFlag(Mage_Catalog_Helper_Product::XML_PATH_PRODUCT_URL_USE_CATEGORY, $categoryStoreId) ||
        $this->getStoreRootCategory($categoryStoreId)->getId() == $category->getId()
    ) {
        return parent::_refreshProductRewrite($product, $category);
    }
    return $this;
}

After applying this rewrite and setting catalog/seo/product_use_categories to false, the number of system rewrites should drop significantly.

Product URL key clashes

When two products have the same url key, each reindex round adds a new custom rewrite.

For example, for two products with id's 123 and 456 with the same url key foo:

  • Reindex round 1:
    • System rewrite /foo-123.phtml > /catalog/product/view/id/123
    • System rewrite /foo.phtml > /catalog/product/view/id/456
  • Reindex round 2:
    • Custom rewrite /foo-123.phtml > foo-124.phtml
    • System rewrite /foo-124.phtml > /catalog/product/view/id/123
    • System rewrite /foo.phtml > /catalog/product/view/id/456
  • Reindex round 3:
    • Custom rewrite /foo-123.phtml > foo-125.phtml
    • Custom rewrite /foo-124.phtml > foo-125.phtml
    • System rewrite /foo-125.phtml > /catalog/product/view/id/123
    • System rewrite /foo.phtml > /catalog/product/view/id/456

To resolve this, make sure all product url keys are unique, for instance by adding the SKU to the url key.

After that, clear all useless custom rewrites from the core_url_rewrite table.

Related Topic