Gitlab – How to Backup at Large Scale

backupgitlablinuxpostgresqlzfs

When asking Gitlab support on how to do a 3TB backup on ones on-premise Gitlab they reply use our tool that produces a tarball.

This just seams wrong to me on all levels. This tarball contains the postgres dump, docker images, repo data, GIT LFS, etc config and so on. Backing up TB of static data together with KB very dynamic data doesn't seam right. And then comes the issue of, we want to do a backup every hour.

Question

I'd really like to know from others how they do it, to get a consistent backup.

ZFS on Linux would be fine with me, if that is part of the solution.

Best Answer

For such a short time between backups (1h), your best bet is to rely on filesystem-level snapshot and send/recv support.

If using ZoL is not a problem in your environment, I would strongly advise to use it. ZFS is a very robust filesystem and you will really like all the extras (eg: compression) it offer. When coupled with sanoid/syncoid, it can provide a very strong backup strategy. The main disvantage is that it is not included into mainline kernel, so you need to install/update it separately.

Alternatively, if you really need to restrict yourself to mainline-included stuff, you can use BTRFS. But be sure to understand its (many) drawbacks and pita.

Finally, an alternative solution is to use lvmthin to take regular backups (eg: with snapper), relying on third party tools (eg: bdsync, blocksync, etc) to copy/ship deltas only.

A different approach would be to have two replicated machines (via DRBD) where you take indipendent snapshots via lvmthin.

Related Solutions

Linux Bash – How to Sort du -h Output by Size

As of GNU coreutils 7.5 released in August 2009, sort allows a -h parameter, which allows numeric suffixes of the kind produced by du -h:

du -hs * | sort -h

If you are using a sort that does not support -h, you can install GNU Coreutils. E.g. on an older Mac OS X:

brew install coreutils
du -hs * | gsort -h

From sort manual:

-h, --human-numeric-sort compare human readable numbers (e.g., 2K 1G)

Postgresql – Backup a big PostgreSQL database continually

Backups are extensively covered in the PostgreSQL manual.

To achieve a continuous backup, archive the write-ahead log. Suggested settings are:

archive_mode = on
wal_level = hot_standby
archive_command = '/usr/bin/rsync --archive --ignore-existing "%p" "/backup-dest/%p"'

# Ensures that a log file is written at least once every 30 minutes even if little
# activity has occurred
archive_timeout = 30min

Instead of doing a pg_dump for your baseline, you can do pg_basebackup, which does not require you to freeze the database. However, if you do not already have archive_mode on, you'll need to restart the database to change that setting.

Best Answer

Related Solutions

Linux Bash – How to Sort du -h Output by Size

Postgresql – Backup a big PostgreSQL database continually

Related Topic