What term is used to describe running frequent batch jobs to emulate near real time

pollingterminology

Suppose users of application A want to see the data updated by application B as frequently as possible. Unfortunately app A or app B cannot use message queues, and they cannot share a database. So app B writes a file, and a batch job periodically checks to see if the file is there, and if load loads it into app A.

Is there a name for this concept? A very explicit and geeky description:
"running very frequent batch jobs in a tight loop to emulate near real time".

This concept is similar to "polling". However polling has the connotation of being very frequent, multiple times per second, whereas the most often you would run a batch job would be every few minutes.

A related question — what is the tightest loop that is reasonable. Is it 1 minute of 5 minutes or …? Recall that the batch jobs are started by a batch job scheduler (e.g. Autosys, Control M, CA ESP, Spring Batch etc.) and so running a job too frequently would causes overhead and clutter.

Best Answer

You were correct the first time, polling is the correct term to use in this situation. Whether you are polling at 1 mHz or at 1 MHz, it is still polling.

^{Note milli Hertz is not a unit I've ever see used, a poll rate of once every million seconds (11.6 days) having limited use. *8')}

From the wikipedia polling page:

Polling, or polled operation, in computer science, refers to actively sampling the status of an external device by a client program as a synchronous activity.

In this case, the batch job is the client, while the file is the mechanism allowing the client to synchronise with the external device (application B).

Determining a suitable poll rate can be a tricky business.

If the client polls too frequently then it could end up starving the device (possibly another process on the same multitasking system) of the resources it needs to source the data needed by the client quickly enough, slowing the whole system down.
Poll too infrequently and your client could be sat idly waiting for the next poll while there is data sat waiting to be processed.

Both cases can result in the sytem running sub-optimally.

As an example of the former, I have seen system which has spent so long servicing "is there new data" requests that it had no time left to actually prepare the data being asked for (a form of livelock).

For the latter, I have a device with a 60 second poll period. Since I might need 3 round trip communications to complete a single transaction with it, each transaction may take anywhere between 3 and 6 minutes (each request happens just before a poll to each requests happens just after a poll).

Best Answer

Related Solutions

Objects Representing Logic and Data – Terminology

Algorithm Performance – Expected Running Time vs. Average Running Time

Related Topic