Postgresql – pg_dump backup compression

backuppostgresql

When executing a pg_dump which should I use and why?

pg_dump -U <user> <database> | gzip -c > backup.gz

pg_dump -F c -f backup.tar.gz -U <user> <database>

Best Answer

According to the pg_dump documentation the custom format is:

the most flexible output format in that it allows manual selection and reordering of archived items during restore. This format is also compressed by default

You can also select the compression level with the -Z option.

For simplicity I would definitely go for pg_dump -F c over piping to gzip.

Related Solutions

Postgresql – PG_Dump Fails Due to Mistaken Low Disk Space

I suggest to change dump format to custom (-Fc, --format c) and avoid tar format at all. AFAIK there are no advantages of using tar format instead of custom (both works with pg_restore).

Reading between the lines, I suspect you are trying to use 'tar' output format, which does have a need to make temp files that can be large. If I guessed right, I'd suggest using 'custom' format instead. There really is no advantage to tar format, and several disadvantages besides this one.

from http://postgresql.1045698.n5.nabble.com/Out-of-space-making-backup-td1904089.html

If your installation comes from package, then probably you have custom format (zlib) support "out of the box". You can control compression level with -Z option (default value is 6) from 0 (no compression) to 9.

BTW check about your -c option. According to http://www.postgresql.org/docs/9.0/static/app-pgdump.html

Output commands to clean (drop) database objects prior to (the commands for) creating them.

This option is only meaningful for the plain-text format. For the archive formats, you can specify the option when you call pg_restore.

BTW2 For convenience you can use automatic PostgreSQL environment variables such as PGHOST, PGUSER, PGDATABASE, PGPORT.

PostgreSQL EC2 Backup Strategy

I can't speak for PostgreSQL as I don't use it, but I use a variation on your option 1 for backing up MySQL databases on EC2, and have successfully restored them, without issue.

The first requirement, of course, is that your databases are stored on an EBS volume so that they can be snapshotted. I favour using XFS as the filesystem since the entire filesystem can easily be frozen.

To begin the snapshotting process, you want to freeze your databases, and flush your tables. There is a great script that will do this, as well as freezing your filesystem (if xfs) called ec2-consistent-snapshot (the site does have some comments on PostgreSQL that may point in your an acceptable direction - it is designed for Ubuntu, but, works on other distributions (e.g. Amazon's Linux/CentOS) without much issue). My understanding is that with PostgreSQL, people often simply take the snapshot (after freezing the file system) and rely on PostgreSQL's built-in recovery abilities to restore everything to a functioning state. The xfs_freeze, is still important to get a consistent snapshot though.

Once your file system is frozen (and your database flushed and locked if possible), take your snapshot (ideally, use the API directly, as opposed to the (very) slow Java based commands). The snapshot command only takes a (few) second(s) to return, after which you can unfreeze the file system - the snapshot created will be consistent, despite additional reads.

Given that snapshots are differential (in a way) and compressed, this approach is much more economical than using S3, as well as offering more options, it also allows you to restore data far faster, and should lock up your databases for a shorter period than generating a dump. It is also possible to rotate your snapshots to keep their numbers under control - I wrote a perl script to do that for me.

If in doubt, try the first option, and then create an EBS volume and test the database to see that all works - don't just trust that the backup is good.

Best Answer

Related Solutions

Postgresql – PG_Dump Fails Due to Mistaken Low Disk Space

PostgreSQL EC2 Backup Strategy

Related Topic