The title of my question is general because I feel like this problem is of a general nature, but to set the stage I'm going to provide a specific example.
We use a homegrown workflow engine that is driven by database tables. Within those tables lurks a directed graph that represents the workflow. The graph contains Stages and Activities; a line is drawn between two Stage nodes, and the resulting Activity node contains code to be executed. We use CSScript to compile and execute the code on the fly.
Within the workflow, Task records represent the work to be executed. Each Task contains some relevant metadata in XML form. The Task records traverse the directed graph, and the code is executed as the Task passes through the activity. So at any given moment, each stage might contain x number of tasks, waiting to be executed on an activity.
To execute a Task on an activity, it needs to be scheduled. A Schedule record containing a datetime, taskid, stageid and activityid determines when and where this Task gets executed next. Periodically, we execute a query that returns Schedule records that are due, and then for each record so returned we stand up an Activity instance and execute it, handing it the Task record as a parameter.
This query used to run 10 times per second. Recently, I added some code that counts how many times the query returns no records, and if this count gets to 60, I reduce the query interval to once per second, and start counting again. If the count reaches 60 again, I reduce the interval to once per minute. If a record appears in the query result, I set the interval back to 10 times per second, and begin the counting process again. The net effect is that the schedule table is rapidly polled during busy activity periods, and sparsely polled during quiet periods. We expect to save a few hundred dollars per Azure instance per month, just from this one simple change.
So here's my question.
This is obviously a polling pattern. Is there a way to make it "event-driven," so that the database is only hit when a schedule record is due, without having to constantly poll the database?
Best Answer
The general solution is to use a database that supports asynchronous notifications. Several do:
NOTIFY
statement as a standalone command or as part of a function. (The latter could be part of a trigger.) Clients can subscribe to notifications by issuing aLISTEN
statement andselect
ing on the connection handle (exactly how varies with language binding).WAITFOR
andRECEIVE
statements to listen for events. May also have (or have had) Oracle-like OCN/QRCN.If you're stuck with one of those that doesn't (MySQL, DB2), it will have to be done out-of-band using one of the methods described in the other answers.
Once you have a method for the database to notify you that something has changed, you can do a query that determines how long it is until the next event is supposed to happen and then wait that long for a notification. If you get a notification, repeat the query/wait cycle. If you don't get a notification, it means the time you calculated has arrived and it's time to do whatever the event dictates. This should get you down to the point where you're only querying the database when you know for sure something needs to happen.