Magento – Magento 1.9.1 Email Queue not working/buggy – how to troubleshoot and what is considered the best patch

ce-1.9.1.0cronemail

First of all yes, this is yet another question/topic about the 1.9.1 email queue. But it is not about about any cron problems (like this or this) or about the new queue feature not being used (like this).

In our case we had the problem, that the queue (core_email_queue and core_email_queue_recipients) simply wouldn't get any emails on new orders or order updates and therefore no more emails got sent out for anything order related, also cron is working perfectly and manually adding emails to the queue works and they get sent out.

The strange thing is, in our test environment everything worked. Even when we went live today in the first minutes all emails got processed but after some minutes (without any further modification on the live system of course) no more new emails got added to the queue at all. It seems like this happened (but I cannot tell for sure) when the first customer used PayPal Express, which we didn't test beforehand :-/ And indeed we were using some custom overrides in the PayPal Express logic with the old sendNewOrderEmail() function. But we couldn't get emails to work again even after patching those to use queueNewOrderEmail().
So the first question would be, is it possible that the old function triggered some inconsistency which 'broke' the email queue? Or is this all just a big coincidence and there is a totally different explanation?

As we couldn't find the problem but of course needed emails to work again asap we went for another core override. In Mage_Core_Model_Email_Template_Mailer (of course in a copy in local) we commented out line 76: ->setQueue($this->getQueue())
This seems to bypass the queue and all mails get sent the old way again.

However, as we'd like to keep the number of core overrides to a minimum and we also cannot tell right now if we will face any other side effects, any other tips or solutions from people with a deeper understanding of the magento code and the email queue would be appreciated.

Update for 1.9.2: On the upgrade to 1.9.2 we had a closer look at the e-mail queue again and weren't able to reproduce the problem. But as we still have no real clue what the problem with 1.9.1 was and as overriding Mage_Core_Model_Email_Template_Mailer::send() still works in the here described way we still aren't using the queue. This way we hope not to run in the same problem again after some time in production.

tl;dr: The email queue isn't working in 1.9.1, commenting out line 76 in Mage_Core_Model_Email_Template_Mailer bypasses the email queue and mails get sent again but this doesn't feel like a good solution. How can this be solved better?

Best Answer

My guess is the setting of cron.php to run every minute has caused a lot of things to stand on top of each other, ie, not finish before the next task scheduled of the same nature or similar is executed. Since both cron.php would not be aware of each state. The same record could be attempted twice causing some odd exception breaking the queue email sends.

With that said there are Mage::Log in the exceptions of the Queue Mailer, so making sure the logging is enabled would be the best step to help determine if theres any exceptions. It may be wise to also just run php -f cron.php from CLI to see if it is also throwing any exceptions, you may not be seeing with it running behind the scenes.

I would also start with a simple PHP mail() test to make sure you're not running into any Spam policies or such. Just to be sure its not something lower in the stack causing the issue.

Just some speculation, hope it helps!

* EDIT *

Use cron.sh instead of cron.php as it will do grep ps to look to see if a previous process is already running.