Linux – PostgreSQL disaster recovery options

backuplinuxpostgresql

My customer has quite a large (the total "data" folder size is 200G) PostgreSQL database and we are working on a disaster recovery plan. We have identified three different types of disasters so far: hardware outage, too much load and unintentional data loss due to erroneously executed bad migration (like DELETE or ALTER TABLE DROP COLUMN).

First two types seem to be easy to mitigate but we can't elaborate a good mitigation plan for the third type. I proposed to use ZFS and frequent (hourly) snapshots but "ZFS" means "OpenIndiana" these days and our Ops engineers do not have much expertise in it, so using OpenIndiana imposes another risk. Colleagues try to convince me that restoring from PostgreSQL PITR backup can be as fast as restoring from a ZFS snapshot but I highly doubt that replaying, say, 50G of archived WALs can be considered "fast".

What other options are we missing? Is ZFS an only viable alternative? Can we get a fast Pg DB restore time in the Linux environment?

Best Answer

Why is FreeBSD not a viable option to run ZFS and PostgreSQL on? The FreeBSD ZFS developers works very close with the Illumos team and just recently Pawel Jakub Dawidek (The man who first ported ZFS to FreeBSD) added SSD TRIM support for ZFS. This will most likely also be added to Illumos ZFS code soon.

Another advantage with FreeBSD and ZFS is the GEOM framework. On Solaris, when entire disks are added to a ZFS pool, ZFS automatically enables their write cache. This is not done when ZFS only manages discrete slices of the disk, since it does not know if other slices are managed by non-write-cache safe filesystems, like UFS. The FreeBSD implementation can handle disk flushes for partitions thanks to its GEOM framework, and therefore does not suffer from this limitation.

Related Topic