Magento 1.9.2.4 – Fastest Way to Check if Products Exist

attributesmagento1.9.2.4product

I have an external file containing product details, I'm looping through this file and for each product in said file I have to check if the product already exists in magento.

The value to evaluate if the product already exists is an attributename and it's value.

So I check this by using (per product)

$collection = Mage::getModel('catalog/product')
                ->getCollection()
                ->addAttributeToSelect('sku')
                ->addAttributeToSelect("$supplier_name")
                ->addAttributeToSelect('production_nr')
                ->addAttributeToSelect('sku_supplier')
                ->addAttributeToFilter($supplier_name,array('eq' => $suppliers_product_nr));

This selection itself doesn't seem to take much time:

            echo 'check: ',round(microtime(true) - $time, 2) , "s<br/>\n";

reports 0.00 s,

however to check if it's an empty collection (ie the product exists in my magento database takes about 0.34-0.40 s

$collection->getSize()

Considering I've got several hundred of thousands of products to check. This will add up quickly. Very quickly.
I was hoping for something more akin to 0.01 or lower in time.

So I'm looking for the fastest way to check if a product exists.
I have quite a bit of freedom in how to change and implement the code so if there's an completely different way of appreaching this problem I'd like to hear about it.


Update:

I've changed it slightly so that instead of checking with magento if a product exists product by product, I instead get an array of all products that have the attribute to check against. I use this array to check if the attribute to check against exists.

This is way faster, but I fear for the overhead impact (primarily ram or cpu) this will have once the amount of products returned becomes too great (we've got about 40.000 products in our magento installation)

Best Answer

The best call would be getAllIds() in my knowledge.

Explanation

$collection = Mage::getModel('catalog/product')
                ->getCollection()
                ->addAttributeToSelect('sku')
                ->addAttributeToSelect("$supplier_name")
                ->addAttributeToSelect('production_nr')
                ->addAttributeToSelect('sku_supplier')
                ->addAttributeToFilter($supplier_name,array('eq' => $suppliers_product_nr));

This one won't take any time since it is just preparation of a collection query. This query will be actually run only when the collection is loaded through load() call or through foreach() loop or through count() call etc.

Here you can check the collection count status in many ways. I am listing out them in their best performance basis (performance decreases while go down).

  • $collection->getAllIds() - Best Option
  • $collection->getSize()
  • count($collection)
  • $collection->count() - Least option

Why This Order ?

count() defines in Varien_Data_Collection and it loads the collection first and then take the count by counting collection items. Since it involves collection load, it will take highest time.

count($collection) do not have much difference with above one.

getSize() defines in Varien_Data_Collection_Db and it avoids collection load. This is the big advantage over here. This will give you a good performance when use it for checking. See answer from Marius for more details.

getAllIds() is the best choice. This method is available in Mage_Eav_Model_Entity_Collection_Abstract and it has a very subtle difference with getSize() in definition and which makes this best choice.

getAllIds() internally calls getAllIdsSql() and see the definition here:

public function getAllIdsSql()
{
    $idsSelect = clone $this->getSelect();
    $idsSelect->reset(Zend_Db_Select::ORDER);
    $idsSelect->reset(Zend_Db_Select::LIMIT_COUNT);
    $idsSelect->reset(Zend_Db_Select::LIMIT_OFFSET);
    $idsSelect->reset(Zend_Db_Select::COLUMNS);
    $idsSelect->reset(Zend_Db_Select::GROUP);
    $idsSelect->columns('e.'.$this->getEntity()->getIdFieldName());

    return $idsSelect;
}

Here $idsSelect->columns('e.'.$this->getEntity()->getIdFieldName()); is what make this method finest in the all available methods. In fact, this line is absent in getSize().

Related Topic