Database – How to Remove Duplicated Products from Catalog

database

I mass import products into my site on a nightly basis (affiliate products). Occasionally the products are duplicated so I want to write a quick SQL Statement to remove the duplicated products.

What I will then do is put the SQL into a PHP script and call it each night after the import so it will remove the duplicated products before I reindex.

I've done some research and found come up with the following script to get all of the product names from the database:

SELECT `value` AS product_nameFROM catalog_product_entity_varchar WHERE entity_type_id = (SELECT entity_type_id FROM eav_entity_type WHERE entity_type_code = 'catalog_product') AND attribute_id = (SELECT attribute_id FROM eav_attribute WHERE attribute_code = 'name' AND entity_type_id = (SELECT entity_type_id FROM eav_entity_type WHERE entity_type_code = 'catalog_product'))

I have also found this example script to identify duplicates in SQL:

`SELECT name, email, COUNT(*) FROM users GROUP BY name, email HAVING COUNT(*) >` 1

I wondered if someone could help me put this together?

Thanks in advance!
Chris

Best Answer

I have tested this and it works but since it involves the deletion of products, I strongly recommend that you backup your database first.

include 'app/Mage.php';
Mage::app();

Mage::register('isSecureArea', 1);

$limit    = 100;
$model    = Mage::getModel('catalog/product');
$resource = $model->getResource();
$adapter  = $resource->getReadConnection();
$name     = $resource->getAttribute('name');

// retrieve a list of all products that have more than 1 occurrence of the same name
$select = $adapter->select()->from($name->getBackendTable(),array($resource->getIdFieldName(),'value','count' => 'COUNT(*)','entity_ids' => new Zend_Db_Expr("GROUP_CONCAT({$resource->getIdFieldName()})")));
$select->where('attribute_id = ?',$name->getAttributeId());
$select->where('entity_type_id = ?',$name->getEntityTypeId());
$select->group('value');
$select->having('count > 1');
$select->limit($limit);
$result = $adapter->fetchAll($select);

// here, we remove the lowest product id from each set of duplicate-named products. 
// the theory is that the product which was created first, hence the lowest 
// product id, is the one you want to keep
$result = array_map(function($row){
    $entityIds = explode(',',$row['entity_ids']);
    $min = min($entityIds);
    $key = array_search($min, $entityIds);
    unset($entityIds[$key]);
    $row['entity_ids'] = $entityIds;
    return $row;
},$result);

// now that we have narrowed down to the ids of the products we want to delete, 
// lets get all product ids to delete into a single array
$entityIds = array();
foreach($result as $row){
    $entityIds = array_merge($entityIds,$row['entity_ids']);
}

// lets delete each one individually
$collection = $model->getCollection();
$collection->addFieldToFilter($resource->getIdFieldName(),array('in' => $entityIds));
foreach($collection as $product){
    $product->delete();
}
Related Topic