Computer power is cheap nowadays. Moreover, you don't know yet where the bottleneck will be.
To me, this smells like premature optimization where you worry about performance before even having the load. Perhaps you should just start by making it work, then about scaling it. My 2 cents.
The question is also if you want quick processing time or high throughput. If the processing is really resource/time intensive, it makes sense to split the file, distribute it and merge the results. However, these of course come at a certain cost:
splitting, sending, scheduling outputs, merging, handling part failures. These taks consume resources too and adds lots of complexity. Distributed computation is only suited for appropriate tasks. Computing a single task per server is sometimes more efficient than doing all this stuff.
Now, the problem is that my SLA does not allow 3 to block 4--persistence in this case takes too long and the client needs to know right away that the request has succeeded (this is the standard use case for the 202
http status code).
But until the state change is persisted, you can't know that you're successful. You might have a sudden problem with electrical power and an errant backhoe (this stuff happens!) and then when the service resumes, it's completely forgotten about the resource. The client comes back and the server has no idea what its talking about. That's not good.
No, you need to speed up commits (and avoid unnecessary processing in the critical path).
You may need to think about how you can logically commit faster, about what it means to commit; you can just commit the fact that there is work to do instead of having to require the result of the work, which can be done much more rapidly. Then when the user comes back, either the processing is done in which case you can 301 to the results, or you can give a result that says why things are still processing or that they've failed.
Getting faster commits might mean thinking more carefully about how you deploy. Are you using the right database choice? Is that database hosted on the right hardware? (Commit-heavy loads are much faster when you've got an SSD to host the transaction log on.) I know it's nice to ignore these things and just deal with the data model at an abstract level, but performance is one of those things where the underlying details have a habit of leaking through.
In response to comments clarifying…
If you've got a genuinely expensive task to perform (e.g., you've asked for a collection of large files to be transferred from some third party) you need to stop thinking in terms of having the whole task complete before the response comes back to the user. Instead, make the task itself be a resource: you can then respond quickly to the user to say that the task has started (even if that is a little lie; you might just have queued the fact that you want the task to start) and the user can then query the resource to find out whether it has finished. This is the asynchronous processing model (as opposed to the more common synchronous processing model).
There are many ways to handle letting the user know that the task has finished. By far the simplest is to just wait until they poll and tell them then. Alternatively, you can push a notification somewhere (maybe an Atom feed or by sending an email?) but these are much trickier in general; push notifications are unfortunately relatively easy to abuse. I really advise sticking to polling (or using cleverness with websockets).
The case where a task might be quick or might be slow and the user has no way to know ahead of time is really evil. Unfortunately, the sane way to resolve it is to make the processing model be asynchronous; it's possible to do either way flipping with REST/HTTP (it's either sending a 200 or a 202; the 202's content would include a link to the processing task resource) but it is quite tricky. I don't know if your framework supports such things nicely.
Be aware that most users really do not understand asynchronous processing. They do not understand having to poll for results. They do not appreciate that a server can be doing things other than handling what they asked it to do. This means that if you are serving up an HTML representation, you probably ought to include some Javascript in it to do the polling (or connecting to a websocket, or whatever you choose) so that they can not need to know about page refreshing or details like that. Like that you can go a long way towards pretending that you're just doing what they asked it to do, but without all the problems associated with actual long-running requests.
Best Answer
The common approach, as Ozz already mentioned, is a message queue. From a design perspective a message queue is essentially a FIFO queue, which is a rather fundamental data type:
What makes a message queue special is that while your application is responsible for en-queueing, a different process would be responsible for de-queueing. In queueing lingo, your application is the sender of the message(s), and the de-queueing process is the receiver. The obvious advantage is that the whole process is asynchronous, the receiver works independently of the sender, as long as there are messages to process. The obvious disadvantage is that you need an extra component, the sender, for the whole thing to work.
Since your architecture now relies on two components exchanging messages, you can use the fancy term inter-process communication for it.
How does introducing a queue affect your application's design?
Certain actions in your application generate emails. Introducing a message queue would mean that those actions should now push messages to the queue instead (and nothing more). Those messages should carry the absolute minimum amount of information that's necessary to construct the emails when your receiver gets to process them.
Format and content of the messages
The format and content of your messages is completely up to you, but you should keep in mind the smaller the better. Your queue should be as fast to write on and process as possible, throwing a bulk of data at it will probably create a bottleneck.
Furthermore several cloud based queueing services have restrictions on message sizes and may split larger messages. You won't notice, the split messages will be served as one when you ask for them, but you will be charged for multiple messages (assuming of course you are using a service that requires a fee).
Design of the receiver
Since we're talking about a web application, a common approach for your receiver would be a simple cron script. It would run every
x
minutes (or seconds) and it would:n
amount of messages from the queue,Notice that I'm saying pop instead of get or fetch, that's because your receiver is not just getting the items from the queue, it's also clearing them (i.e. removing them from the queue or marking them as processed). How exactly that will happen depends on your implementation of the message queue and your application's specific needs.
Of course what I'm describing is essentially a batch operation, the simplest way of processing a queue. Depending on your needs you may want to process messages in a more complicated manner (that would also call for a more complicated queue).
Traffic
Your receiver could take into consideration traffic and adjust the number of messages it processes based on the traffic at the time it runs. A simplistic approach would be to predict your high traffic hours based on past traffic data and assuming you went with a cron script that runs every
x
minutes you could do something like this:A very naive & dirty approach, but it works. If it doesn't, well, the other approach would be to find out the current traffic of your server at each iteration and adjust the number of process items accordingly. Please don't micro-optimize if it's not absolutely necessary though, you'd be wasting your time.
Queue storage
If your application already uses a database, then a single table on it would be the simplest solution:
It really isn't more complicated than that. You can of course make it as complicated as you need, you can, for example, add a priority field (which would mean that this is no longer a FIFO queue, but if you actually need it, who cares?). You could also make it simpler, by skipping the processed field (but then you'd have to delete rows after you processed them).
A database table would be ideal for 2000 messages per day, but it would probably not scale well for millions of messages per day. There are a million factors to consider, everything in your infrastructure plays a role in the overall scalability of your application.
In any case, assuming you've already identified the database based queue as a bottleneck, the next step would be to look at a cloud based service. Amazon SQS is the one service I used, and did what it promises. I'm sure there are quite a few similar services out there.
Memory based queues is also something to consider, especially for short lived queues. memcached is excellent as message queue storage.
Whatever storage you decide to build your queue on, be smart and abstract it. Neither your sender nor your receiver should be tied up to a specific storage, otherwise switching to a different storage at a later time would be a complete PITA.
Real life approach
I've build a message queue for emails that's very similar to what you are doing. It was on a PHP project and I've build it around Zend Queue, a component of the Zend Framework that offers several adapters for different storages. My storages where:
My messages were as simple as they can be, my application created small arrays with the essential information (
[user_id, reason]
). The message store was a serialized version of that array (first it was PHP's internal serialization format, then JSON, I don't remember why I switched). Thereason
is a constant and of course I have a big table somewhere that mapsreason
to fuller explanations (I did manage to send about 500 emails to clients with the crypticreason
instead of the fuller message once).Further reading
Standards:
Tools:
Interesting reads: