Gitlab – How to Backup at Large Scale

backupgitlablinuxpostgresqlzfs

When asking Gitlab support on how to do a 3TB backup on ones on-premise Gitlab they reply use our tool that produces a tarball.

This just seams wrong to me on all levels. This tarball contains the postgres dump, docker images, repo data, GIT LFS, etc config and so on. Backing up TB of static data together with KB very dynamic data doesn't seam right. And then comes the issue of, we want to do a backup every hour.

Question

I'd really like to know from others how they do it, to get a consistent backup.

ZFS on Linux would be fine with me, if that is part of the solution.

Best Answer

For such a short time between backups (1h), your best bet is to rely on filesystem-level snapshot and send/recv support.

If using ZoL is not a problem in your environment, I would strongly advise to use it. ZFS is a very robust filesystem and you will really like all the extras (eg: compression) it offer. When coupled with sanoid/syncoid, it can provide a very strong backup strategy. The main disvantage is that it is not included into mainline kernel, so you need to install/update it separately.

Alternatively, if you really need to restrict yourself to mainline-included stuff, you can use BTRFS. But be sure to understand its (many) drawbacks and pita.

Finally, an alternative solution is to use lvmthin to take regular backups (eg: with snapper), relying on third party tools (eg: bdsync, blocksync, etc) to copy/ship deltas only.

A different approach would be to have two replicated machines (via DRBD) where you take indipendent snapshots via lvmthin.

Related Topic