Postgresql – How to fix issue causing “incomplete startup packet” log message trying to implement replication in Postgresql

postgresqlreplication

I've got two cloud servers running Ubuntu 13.04 and PostgreSQL 9.2.

I've primarily used this blog post to aid me in setting things up. However, to do the initial database dump to the slave I'm using pg_start_backup/pg_stop_backup strategy used in this other blog post. I've read through the docs and postgres wikis as well. I ran into several problems I was able to solve, but I can't get past this wretched "the database is starting up" failure.

I'm not sure if seeing:

cp: cannot stat /var/lib/postgresql/9.2/archive/00000001000000000000003A':
No such file or directory

after consistent recover state reached is normal or the first sign of a problem. The searching I've done on the database is starting up and incomplete startup packet tells me that something is sending empty TCP packets to the slave. The only thing that even knows about the slave is the master, so I'm not sure why it's sending empty packets…

Has anyone worked with this and have an idea what might be going wrong?

The postgres log on the slave looks like so:

2013-08-26 13:01:38 CDT LOG:  entering standby mode
2013-08-26 13:01:38 CDT LOG:  restored log file "000000010000000000000039" from archive
2013-08-26 13:01:38 CDT LOG:  incomplete startup packet
2013-08-26 13:01:39 CDT LOG:  redo starts at 0/39000020
2013-08-26 13:01:39 CDT LOG:  consistent recovery state reached at 0/390000E0
cp: cannot stat '/var/lib/postgresql/9.2/archive/00000001000000000000003A': No such file or directory
2013-08-26 13:01:39 CDT LOG:  streaming replication successfully connected to primary
2013-08-26 13:01:39 CDT FATAL:  the database system is starting up
2013-08-26 13:01:39 CDT FATAL:  the database system is starting up
2013-08-26 13:01:40 CDT FATAL:  the database system is starting up
2013-08-26 13:01:40 CDT FATAL:  the database system is starting up
2013-08-26 13:01:41 CDT FATAL:  the database system is starting up
2013-08-26 13:01:42 CDT FATAL:  the database system is starting up
2013-08-26 13:01:42 CDT FATAL:  the database system is starting up
2013-08-26 13:01:43 CDT FATAL:  the database system is starting up
2013-08-26 13:01:43 CDT FATAL:  the database system is starting up
2013-08-26 13:01:44 CDT FATAL:  the database system is starting up
2013-08-26 13:01:44 CDT FATAL:  the database system is starting up
2013-08-26 13:01:44 CDT LOG:  incomplete startup packet
2013-08-26 13:03:27 CDT FATAL:  the database system is starting up
2013-08-26 13:03:27 CDT FATAL:  the database system is starting up
2013-08-26 13:03:30 CDT FATAL:  the database system is starting up
2013-08-26 13:03:30 CDT FATAL:  the database system is starting up

thanks!
brad

Best Answer

You're failing to copy the last WAL archive(s) after pg_stop_backup. You need to have WAL archiving set up or manually copy the WAL if you're going to use this method.

In 9.2 and above it's much easier to do one-off copies with pg_basebackup --xlog-method=stream. This copies the transaction logs over the replication protocol, as well as the base backup its self, and takes care of pg_start_backup and pg_stop_backup automatically.

See the pg_basebackup manual.

The "incomplete startup packet" errors mentioned in the title are likely unrelated, and caused by attempts to make SSL connections to a non-SSL-aware server from clients that have sslmode=prefer.