Linux – watchdog: basic configuration options don’t work as expected


I run ubuntu 14.04 LTS and watchdog 5.13.
My goal is to achieve following:

  • run external check script every 30 seconds
  • reboot if script fails during 300 seconds (e.g. 10 failed attempts in a row)

I am having issues with the most basic watchdog configuration:

$ cat /etc/watchdog.conf
watchdog-device = /dev/watchdog
watchdog-timeout = 300
interval = 30
test-binary = /usr/local/sbin/
realtime = yes
priority = 1

$ cat /etc/default/watchdog
watchdog_options="-c /etc/watchdog.conf --verbose"

According to syslog,

  1. watchdog-timeout is being set to 254s (discussed here).
  2. System reboots after first failure of test-binary.

Is it an expected behaviour or am I missing something?

P.S. At this moment I've implemented a 'wait until 10 failures' logic in my script itself.

Best Answer

I can't speak for the watchdog-timeout being clamped to 254 seconds but what you link to certainly explains it.

Watchdog timers don't generally run in a "N failures in a row" mode though. At the first indication of error they reboot the machine so the behaviour you're seeing is how I'd expect it to work. Usually they're implemented in hardware which requires "tickling" within the configured period otherwise it will hard power cycle the machine with no warning whatsoever. This is to try and rescue from kernel panics, etc.

Related Topic