How to Manage Automated Emails Sent from a Web Application

Architectureweb-applicationsweb-designweb-development

I am designing a web application and I am wondering how to design the architecture to manage sending automated emails.

Currently I have this feature build into my web app and the emails are sent based on user input / interactions (like creating a new user). The problem is that connecting directly to a mail server takes a couple of seconds. Scaling my application up, this will be a significant bottle neck in the future.

What is the best way to manage sending a large amount of automated emails within my system architecture?

There won't be a huge amount of emails sent (2000 a day max). Emails don't need to be send immediately, up to 10 mins lag is fine.

Update:
Message queuing has been given as an answer, but how would this be designed? Would this be handled in the app and processed during a quiet period, or do i need to create a new 'mail app' or web service to just manage the queue?

Best Answer

The common approach, as Ozz already mentioned, is a message queue. From a design perspective a message queue is essentially a FIFO queue, which is a rather fundamental data type:

FIFO queue

What makes a message queue special is that while your application is responsible for en-queueing, a different process would be responsible for de-queueing. In queueing lingo, your application is the sender of the message(s), and the de-queueing process is the receiver. The obvious advantage is that the whole process is asynchronous, the receiver works independently of the sender, as long as there are messages to process. The obvious disadvantage is that you need an extra component, the sender, for the whole thing to work.

Since your architecture now relies on two components exchanging messages, you can use the fancy term inter-process communication for it.

How does introducing a queue affect your application's design?

Certain actions in your application generate emails. Introducing a message queue would mean that those actions should now push messages to the queue instead (and nothing more). Those messages should carry the absolute minimum amount of information that's necessary to construct the emails when your receiver gets to process them.

Format and content of the messages

The format and content of your messages is completely up to you, but you should keep in mind the smaller the better. Your queue should be as fast to write on and process as possible, throwing a bulk of data at it will probably create a bottleneck.

Furthermore several cloud based queueing services have restrictions on message sizes and may split larger messages. You won't notice, the split messages will be served as one when you ask for them, but you will be charged for multiple messages (assuming of course you are using a service that requires a fee).

Design of the receiver

Since we're talking about a web application, a common approach for your receiver would be a simple cron script. It would run every x minutes (or seconds) and it would:

  • Pop n amount of messages from the queue,
  • Process the messages (i.e. send the emails).

Notice that I'm saying pop instead of get or fetch, that's because your receiver is not just getting the items from the queue, it's also clearing them (i.e. removing them from the queue or marking them as processed). How exactly that will happen depends on your implementation of the message queue and your application's specific needs.

Of course what I'm describing is essentially a batch operation, the simplest way of processing a queue. Depending on your needs you may want to process messages in a more complicated manner (that would also call for a more complicated queue).

Traffic

Your receiver could take into consideration traffic and adjust the number of messages it processes based on the traffic at the time it runs. A simplistic approach would be to predict your high traffic hours based on past traffic data and assuming you went with a cron script that runs every x minutes you could do something like this:

if( 
    now() > 2pm && now() < 7pm
) {
    process(10);
} else {
    process(100);
}

function process(count) {
    for(i=0; i<=count; i++) {
        message = dequeue();
        mail(message)
    }
}

A very naive & dirty approach, but it works. If it doesn't, well, the other approach would be to find out the current traffic of your server at each iteration and adjust the number of process items accordingly. Please don't micro-optimize if it's not absolutely necessary though, you'd be wasting your time.

Queue storage

If your application already uses a database, then a single table on it would be the simplest solution:

CREATE TABLE message_queue (
  id int(11) NOT NULL AUTO_INCREMENT,
  timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  processed enum('0','1') NOT NULL DEFAULT '0',
  message varchar(255) NOT NULL,
  PRIMARY KEY (id),
  KEY timestamp (timestamp),
  KEY processed (processed)
) 

It really isn't more complicated than that. You can of course make it as complicated as you need, you can, for example, add a priority field (which would mean that this is no longer a FIFO queue, but if you actually need it, who cares?). You could also make it simpler, by skipping the processed field (but then you'd have to delete rows after you processed them).

A database table would be ideal for 2000 messages per day, but it would probably not scale well for millions of messages per day. There are a million factors to consider, everything in your infrastructure plays a role in the overall scalability of your application.

In any case, assuming you've already identified the database based queue as a bottleneck, the next step would be to look at a cloud based service. Amazon SQS is the one service I used, and did what it promises. I'm sure there are quite a few similar services out there.

Memory based queues is also something to consider, especially for short lived queues. memcached is excellent as message queue storage.

Whatever storage you decide to build your queue on, be smart and abstract it. Neither your sender nor your receiver should be tied up to a specific storage, otherwise switching to a different storage at a later time would be a complete PITA.

Real life approach

I've build a message queue for emails that's very similar to what you are doing. It was on a PHP project and I've build it around Zend Queue, a component of the Zend Framework that offers several adapters for different storages. My storages where:

  • PHP arrays for unit testing,
  • Amazon SQS on production,
  • MySQL on the dev and testing environments.

My messages were as simple as they can be, my application created small arrays with the essential information ([user_id, reason]). The message store was a serialized version of that array (first it was PHP's internal serialization format, then JSON, I don't remember why I switched). The reason is a constant and of course I have a big table somewhere that maps reason to fuller explanations (I did manage to send about 500 emails to clients with the cryptic reason instead of the fuller message once).

Further reading

Standards:

Tools:

Interesting reads: