Magento – best practice for a sceduled product import from csv files

importproduct

It would run every night and read a local csv file (an export from the ERP). It should have the following features:

  • Two languages/storeviews
  • Update attributes, product names, images, categories, tire prices, stock
  • Delete products that are not in the csv any more
  • Manage categories (create, update, delete)
  • Manage attribute options (create, update, delete)
  • Capable to update >5000 products in a reasonable time (at least before sunrise, better in less than an hour)

Three years ago I ended up writing a handcrafted script that keeps track of all current values, does some direct db-updates, loads/saves the complete product where necessary and has its own logic for category and attribute option management.

Is there a better way today? Maybe using Magmi, the (since then enhanced) Magento Dataflow, or just by using different models in a still handcrafted script (like Mage::getSingleton('catalog/product_action')->updateAttributes(...); which would help with the attributes but not with names, images, tire prices etc.)

Best Answer

If only it was all that simple companies would not spend $10s to $100s thousands on Supplier Onboarding tools and ETL/ESB connectors. There is no business information, Magento CE or EE, website or global pricing, percentage of records changing per day, whether you have a separate admin/batch server, size of company either orders per day - visitors per day - revenue per year. All these affect the choice of the solution.

Now, if you are not interested in the business side and just want a technical solution you have Magmi, import scripts via API (with or without indexing switched on), or ETL tools (SaaS or in-house). The devil though is in the detail. An example, if you use API without index onsave you need to reindex, if you have too many stores or too many products on a high volume site, you have frontend issues. If you do not have a cluster this overloads the servers slowing performance, and if you have website pricing you need to perform multiple updates. For all intents and purposes Magmi is the API without index on save, just a lot faster. So if you have a high-volume site you use either EE or CE with index on save.

We have worked with sites with 100s stores and 100s thousands of products with website-pricing and automated data loading (the most complex of complex until the soft limit of Magento was hit, pushed passed it, and hit a hard limit) all the way down to the simplest store. If you are a single store (or in your case double) with single currency with 5,000 products then the API load script, Magmi and reindex, or CSV uploads will work just fine, any more and it all starts to become very complex the more you add, hence why we are under nda for the solutions as the high-end consultants who developed them are only a handful who have any idea how to bring all the pieces together (hosting, clusters, indexing, loading, delta records, data cleaning, data mapping).

Just stick with the script, you can enhance it with Magmi calls to their API so you get the best of both worlds, one of the solutions from the consultants did this, if it is working why change it.

Related Topic