Windows – Postgres 8.4 stopped working

postgresqlwindows

A windows 7 machine running a Postgres 8.4.x server stopped working properly this morning. The errors are more than strange:

  • The service is not running, according to the services control panel.
  • Astonishingly, the server is actually running, because we can connect with both pgAdmin and our own ODBC-based software, but some queries crash (that's how we found out it's broken).
  • Neither starting not stopping the server does anything, and both return with errors, either failure to access the data directory, or just no useful error at all (net helpmsg gives me exactly nothing).

Event-Viewer, most of these are there multiple times, and happen when booting or trying to start/stop the server.

PostgreSQL - Error - Se agotó el tiempo de espera al inicio del servidor
2013-12-03 21:33:32 GMT FATAL:  el archivo de bloqueo «postmaster.pid» ya existe
2013-12-03 21:33:32 GMT HINT:  ¿Hay otro postmaster (PID 2952) corriendo en el directorio de datos «C:/Program Files (x86)/PostgreSQL/8.4/data»?
pg_ctl: no se pudo encontrar el ejecutable postgres
2013-12-03 18:46:34 CET FATAL:  no se pudo crear ningún socket TCP/IP

pg_dump is able to pull a complete backup (so I did that before messing around), despite the service still not marked as running.

I've tried deleting the pid-file, as that's something I find recommended everywhere, but that didn't help.

Going through the log-files actually found a real error in a query, but that still does not solve the issue that I cannot stop or start the postgres service. Edit: And fixing that invalid query doesn't solve anything, the application still crashes.

Relevant parts of the pg_hba.conf:

host    all     all     127.0.0.1/32     md5
host    all     all     192.168.0.0/16   md5
host    all     all     fe80::/48        md5

Some logs (the others have the equivalent content):
http://pastebin.com/v9gtiDmJ
http://pastebin.com/wxYr8TUM

Best Answer

First for the love of your data PLEASE stop making changes to the system.
You need to fully analyze the problem before you start making changes, otherwise you run a real risk of making the problem worse.


As you've surmised Postgres is running on your system - we know this because pg_dump works. If there were no Postgres server pg_dump wouldn't have anything to talk to.
This means your problem (as it existed at the start) is entirely cosmetic (so Service Manager thinks it's down - who cares?! It's up, and that's what's important).
The "easy fix" in this case is to just ignore the situation -- If it ain't broke, don't break it!

The next easy fix (if Service Manager's opinion matters to you) is to manually stop Postgres using pg_ctl, then restart it with service manager.
This isn't an option anymore for you because you deleted the PID file. Now pg_ctl doesn't know which Postgres process to signal. (On Unix this is trivial to remedy - simply signal the Postgres process with the lowest PID to terminate and the database shuts down. I'm not sure what the equivalent is on Windows, but if you do you can do that.)

The last option is a reboot. This is what you'd do if the two options above failed to make Service Manager report a status consistent with reality.
On Unix this should gracefully shut down Postgres when all system processes get the TERM signal, and I assume Windows has equivalent behavior (but even if it doesn't that's OK - Postgres would treat it as a crash and recover the next time it starts up).


Because you upgraded your Postgres installation you now have to execute either the second or third option - you can't leave your system in its current condition.
You are currently in a state where the binaries/libraries on disk do not match the ones running, and that's not a good position to be in. To ensure that your system is in a known, consistent state you must restart Postgres.

In your case I would suggest stopping the database server, reinstalling the binaries (to ensure none were missed because they were locked), and then starting it again. You generally don't ever do an upgrade of the binaries while Postgres running - you shut it down first. At least on Unix systems you can "get away with it" most of the time, but why take the chance of something going wrong?