Scheduled LotusScript agent not restarting after runtime error

ibm-domino

Since upgrading Windows Domino server from 8.5.3 to 9 an important agent has not started after runtime error twice in two weeks. We have 3 agent manager instances running at day time (07:00 – 16:30) and 5 at night time. There are quite many databases with agents running every 5 minutes but most of these agents run in less than 1 sec.

This important scheduled (every 5 minutes) LotusScript agent runs for 24 hours, hits the time limit and starts again (it spends most of the time in sleep). Sometimes it stops in a runtime error. The last time was yesterday when it stopped at 11:30 and it did not start until today at 14:20 when I disabled and enabled the agent.

Before this enable/disable I checked that all 3 instances of Agent Manager were idle but for some reason they were not picking up this agent any more. Here is Amgr status before the disable/enable:

> Tell Amgr Status
29.07.2013 14:17:38   AMgr: Status report at '29.07.2013 14:17:38'
29.07.2013 14:17:38       Agent Manager has been running since '29.07.2013 14:05:27'
29.07.2013 14:17:38       There are currently '3' Agent Executives running
29.07.2013 14:17:38       There are currently '520' agents in the Scheduled Task Queue
29.07.2013 14:17:38       There are currently '100' agents in the Eligible Queue
29.07.2013 14:17:38       There are currently '1' databases containing agents triggered by new mail
29.07.2013 14:17:38       There are currently '1' agents in the New Mail Event Queue
29.07.2013 14:17:38       There are currently '0' databases containing agents triggered by document updates
29.07.2013 14:17:38       There are currently '0' agents in the Document Update Event Queue
29.07.2013 14:17:38   AMgr: Current control parameters in effect:
29.07.2013 14:17:38       AMgr: Daily agent cache refresh is performed at '04:15:00'
29.07.2013 14:17:38       AMgr: Currently in Daytime period
29.07.2013 14:17:38       AMgr: The maximum number of concurrently executing agents is '3'
29.07.2013 14:17:38       AMgr: The maximum number of minutes a LotusScript/Java agent is allowed to run is '1440'
29.07.2013 14:17:38   AMgr: Executive '1', total agent runs: 322855
29.07.2013 14:17:38   AMgr: Executive '1', total elapsed run time: 28064
29.07.2013 14:17:38   AMgr: Executive '2', total agent runs: 102967
29.07.2013 14:17:38   AMgr: Executive '2', total elapsed run time: 364127
29.07.2013 14:17:38   AMgr: Executive '3', total agent runs: 297064
29.07.2013 14:17:38   AMgr: Executive '3', total elapsed run time: 78582

There seems to be a maximum of 100 for eligible agents because I always get 100. Is that the problem and how to increase the maximum?

If the agent manager is too busy at day time (which did not seem to be the case because all 3 were idling when I looked) I would expect it to start the agent at least at night time when there are 5 instances.

Any ideas how to fix the problem or should I just add all kinds of AMgr debug parameters to notes.ini to get more info when this happens next time.

After this last occurrence I disabled agents in some old databases and increased AMgr instances by 1.

I also tested the runtime error in a different db with a simple test agent but that started again after error.

Best Answer

Be aware that a scheduled agent will only run if the there is no other scheduled agent running in the same database at the same time.

example.

Database A 
  - Agent X   (Every 5 minutes)
  - Agent Y   (Every 10 minutes)

In this instance, X will run followed by X or Y. If Y runs then X will miss it's run time and be added to the queue to follow Y when free time is available.

Agents should not run beyond the scheduling time either. So something running every 5 minutes should if at all possible keep the worktime down to under a minute.

Agents will be killed if they run over the allowed max runtime for the server. Exceptions to this is if the agent is calling 3rd party DLL's and hang is there, or if you don't extend NotesThread and spawn threads in a Java Agent (there may be other conditions, but those are the most common).

Because of these factors it is possible to have idle Agent managers when everything is backed up to run.

To diagnose the issue you can use Agent Manager debug. From the domino console:

set config DEBUG_THREADID=1
tell aMgr debug *

This will generate more verbose logs in trying to diagnose what is happening. Also:

tell aMgr sched

Will breakdown the current schedule.

If you have an agent that has to execute frequently, you can use a program document instead. The downside is you won't be able to kill the agent if it hangs. Also you need to be aware of what it is touching to prevent deadlocks with other agents.

For Domino 9, you can use DOTS for server based scheduled code (Java). Less limitations then Amgr.

Related Topic