Postgresql – Best PostgreSQL replication solution that minimizes network costs

database-replicationpostgresqlreplication

I'm looking into PostgreSQL replication solutions. I know two varieties of these solutions

  1. Low level – which involves streaming log-shipping, hot-standby features available in PostgreSQL 9.x
  2. High level – Slony, Londiste that work at SQL query level

My database is not very busy and huge (at least now). But I want to avoid downtimes due to Amazon EC2 failures (like the one that happened recently). My solution is to maintain a slave in a different Availability zone that will remain in sync with my master database instance. That way I can failover to it when master goes down. One point to consider is that, this will require continuous replication of data from master to slave, which will be network traffic across EC2 availability zones. It's not free. It costs 1 cent per GB right now, but after reading some calculations in the PostgreSQL cookbook, I learnt that costs can grow very high even if DB traffic is low. For instance in the "Hot physical backup and continuous archiving" chapter of the "PostgreSQL 9 Administration Cookbook" I read this:

If archive_timeout is set to 30 seconds, we will generate a minimum of
2*60*24 = 2880 files per day, each 16 MB in size, so a total volume of
46 GB per day (minimum)

[and that I assume with minimum traffic on the DB]

My only requirement is that, each write SQL query that executes on the Master server, should be replayed on the Slave server. If this is done on event callback, then that will be perfect because there will be data transfer between master and slave only when DB is being modified and not every 30 seconds or so even if no changes to DB have occured.

Therefore I thought Londiste might be the solution for me, but I am not 100% sure that it works that way.

What do you suggest?

Best Answer

After a week of research I believe Streaming Log Shipping and Hot Standby configuration of PostgreSQL instances serves my needs of instant replication (minimum window of data loss), yet low network traffic. I've written a detailed blog post on how I set it up.

There might be other solutions too, using 3rd party tools like pgpool, but I didn't have much success with them.

Related Topic