Debian – Why is the cron daemon is being killed every few minutes

crondebianopenvzservicestrace

As of about a week ago, my cron daemon refuses to stay running. I'm using Debian 6 x64 on an OpenVZ virtual machine. Running something like pgrep cron shows that the daemon isn't running. I start the service with service cron start or /etc/init.d/cron start and it launches, but it disappears from the running process list after a few minutes (varying anywhere between 1 – 30 minutes before the process is killed again).

Using strace -f service cron start, I can see that the process is being killed for some reason:

nanosleep({60, 0},  <unfinished ...>
+++ killed by SIGKILL +++

There's nothing relevant in /var/log/syslog, /var/log/messages, /var/log/auth.log, or /var/log/kern.log to explain why the the process is dying. The system has at least 800 MB of free memory, and cat /proc/loadavg returns 0.22 0.13 0.04 so resources shouldn't be the issue. With cron running, free -m reports:

             total       used       free     shared    buffers     cached
Mem:          1024        211        812          0          0          0
-/+ buffers/cache:        211        812
Swap:            0          0          0

I also tried removing and reinstalling the cron package using apt-get.

Update: I initially thought the problem was a resource issues. I erased my entire VPS and started from a fresh Debian image. There is now nothing else running on the system, but even from a clean install my cron daemon is still being killed at random.

What else should I check? How do I find out what's killing my crond?

Best Answer

Look at /proc/user_beancounters, more specifically, at the failcnt column.

For all the non-zero entries, you'll need to increase the barrier/limit accordingly, it's probably just OpenVZ killing your processes for hitting them.

Here is a description of each column: http://wiki.openvz.org/Proc/user_beancounters

For accountable parameters, the field held shows the current counter for the container (resource “usage”), and the field maxheld shows the counter's maximum for the last accounting period. The accounting period is usually the lifetime of the container.

The field failcnt shows the number of refused “resource allocations” for the whole lifetime of the process group.

The barrier and limit fields are resource control settings. For some parameters, only one of them may be used, for some parameters — both. These fields may specify limits or guarantees, and the exact meaning of them is parameter-specific. Description of each parameter in UBC parameters contains information about the difference between the barrier and the limit for the parameter.

Related Topic