Postgresql – Monit “is alive” check of PostgreSQL tends to fail during a run of pg_dump, is there a better way

backupmonitpostgresqlpsql

I use pg_dump for my primary backup, once every three hours. I also use monit. When monit checks if PostgreSQL is alive during the pg_dump run, it sometimes times out, and restarts postgres. This results in failed backup.

What to do? Move to Write-Ahead-Logs? Disable monit during the backup? The database is serving an active web site at these times.

Monit config.:

check process postgres with pidfile /usr/local/pgsql/data/postmaster.pid
group database
start program = "/etc/init.d/postgresql start"
stop program = "/etc/init.d/postgresql stop"
if failed unixsocket /tmp/.s.PGSQL.5432 protocol pgsql then restart
if failed host 127.0.0.1 port 5432      protocol pgsql then restart
if 5 restarts within 5 cycles then timeout

Best Answer

So something like this?

if failed unixsocket /tmp/.s.PGSQL.5432 protocol pgsql for 5 cycles then restart
if failed host 127.0.0.1 port 5432      protocol pgsql for 5 cycles then restart
if 5 restarts within 25 cycles then timeout

That way the monit check would have to unreachable for 15 minutes before a restart. Assuming a 180 second cycle interval. Obviously you can adjust to your tastes, but resetting after a single failed check can result in false positives if your server happens to be busy or otherwise occupied.