Linux – cron daemon forking multiple processes and running jobs multiple times

cronlinux

I'm running a bunch of cron jobs (set up using sudo crontab -e) on Ubuntu, and recently (though I don't know when to be sure) I'm suddenly seeing the same jobs appear to be run multiple times in the same minute. You don't need to see the full crontab in order to trust me that they are absolutely not listed twice. But for a flavor here is a snippet:

*/2 * * * * /usr/bin/wget --no-check-certificate 'https://myserver.net/someuri/pdm/33?embed_in_page=xyz'
* * * * * /usr/bin/wget --no-check-certificate 'https://myserver.net/someuri/pdm/77'
* * * * * /usr/bin/wget --no-check-certificate 'https://myserver.net/someuri/pdm/20?blah=blah'
* * * * * echo "`date` Running now" >> /home/somewhere/croncheck

I've also added that simple echo to the end to run every minute, and that never appears to run more than once.

So, for the echo I only see it happen once a minute. But intermittently, especially under load, the server seems to fire the wget URL requests multiple times in succession (I know by looking at the webserver log that they are coming from the same place at roughly the same time).

If I run ps -A |grep cron

I'll see a dozen or more entries looking like:

28055 ?        00:00:00 cron

They do not seem to go away.

If I run ps aux then I only see the one entry as I expected.

So, my guess is that there is a nasty interaction between wget and cron, and maybe something is failing in some way (even though the server, and interactive calls to the same webserver through a web browser seem largely unnaffected, just slowed by the unnecessary work). But the reality is that I don't know. I'm looking for any ideas you may have as to the cause, and possible solutions to the problem.

Best Answer

Although I'm not entirely sure why this is happening, it seems to be that long running jobs inside cron cause an odd behaviour. In the original post have a series of jobs that are fired every five minutes and some one minute after each other. If the first job takes too long (10 minutes for example) my guess is that the forked process of the first job is hanging around for ten minutes and fires subsequent jobs in the list. But the primary cron process is also firing these jobs correctly, so duplicates occur and the whole issue cascades.

Enough of the guesswork. To fix, here is a snippet of my new crontab, using flock to block subsequent jobs that could cascade incorrectly.

# every 5 minutes
*/5 * * * * flock -w 2000 /tmp/cnsd.lockfile.pdm -c "/usr/bin/longrunningjob1"
# every 5 minutes
*/5 * * * * flock -w 2000 /tmp/cnsd.lockfile.pdm -c "/usr/bin/longrunningjob2"
# every 15 minutes past the hour
15 * * * * flock -w 2000 /tmp/cnsd.lockfile.pdm -c "/usr/bin/longrunningjob3"

At least this way, when the system is overloaded it doesn't cause a cascade of jobs that should never have been fired, instead giving everything a chance to recover. I did play with different classes of jobs using different flock lockfiles, but in my case I only had limited resources so queuing one behind another was the best way to keep the system running.