Svn – Subversion repository size & backup file difference

svnubuntu-10.04

A subversion repository size of 5.2 GB with 339 revisions.

When I run the backup using Perl script the end result of file size was 28 GB.

Can I know, how this could be possible. I am confused.

Backup Script below.

#!/usr/bin/perl
my $svn_repo = "/subversion/REPONAME";
my $bkup_dir = "/mnt/Subversion/SVN-Backups/REPO_DIR/";
my $bkup_file = "REPONAME_backup-";
my $bkup_svr = "my.backup.com";
my $bkup_svr_login = "backup";

$bkup_file = $bkup_file . `date +%Y%m%d-%H%M`;
chomp $bkup_file;
my $youngest = `svnlook youngest $svn_repo`;
chomp $youngest;

my $dump_command = "svnadmin -q dump $svn_repo > $bkup_dir/$bkup_file";
print "\nDumping Subversion repo $svn_repo to $bkup_file...\n";
print `$dump_command`;
print "Backing up through revision $youngest... \n";
print "\nCompressing dump file...\n";
print `gzip -9 $bkup_dir/$bkup_file\n`;
chomp $bkup_file;
my $zipped_file = $bkup_dir . "/" . $bkup_file . ".gz";
print "\nCreated $zipped_file\n";

Best Answer

Subversion uses sophisticated compression. However you gziped your backup using maximum compression and your backup file is still much larger than the repository. If your repository includes many identical files this could be explained by representation sharing:

"While deltified storage has been a part of Subversion's design since the very beginning, there have been additional improvements made over the years. Subversion repositories created with Subversion 1.4 or later benefit from compression of the full-text representations of file contents. Repositories created with Subversion 1.6 or later further enjoy the disk space savings afforded by representation sharing, a feature which allows multiple files or file revisions with identical file content to refer to a single shared instance of that data rather than each having their own distinct copy thereof." source and more details

To further shrink your backup file you could switch the algorithm. The compression rates of bzip2 ore LZMA are better, but they are slower than gzip.

You could test that by using svn export to export a version of the repository to a test directory. If you compress that export the same way you did in your backup script, then the resulting filesize should be about the same.

Related Topic