Postgresql – Streaming PostgreSQL pg_dump to S3

amazon s3postgresql

Is it possible or advisable to stream/pipe pg_dump output to S3?

We are dumping large datasets to our instance and the database size is big. So trying to optimize for local disk space (avoid temp space for dump) and create the backup straight on S3.

We have a PostgreSQL v9.6.3 on Ubuntu 16.04.

Best Answer

You can use s3's multipart upload feature to stream the dump as it's being generated. However that is likely to be error prone and less than reliable. A better approach is to create an ephemeral EBS volume, dump your database to it. And then upload the compressed backup to s3/Glacier if that is where it needs to go.

If you are wanting a backup for point in time recovery doing a pg_basebackup to an EBS volume and archiving the WAL stream from the point after the backup means you could cut the time to recovery without keeping a full replica node. If your concern is availability then setting up replication is the way to go. Although you will still want backups.

Replication is not backup, if someone drops a table on the Origin it will be dropped on the Replica; so you still need PITR or checkpoint backups.