How to identify and potentially remove big binary commits inside an SVN repository

fsfssvn

I am working with an SVN repository that is over 3 years old, contains over 6,100 commits and is over 1.5 GB in size. I want to reduce the size of the SVN repository (I'm not talking about the size of a full SVN export – I mean the full repository as it would exist on the server) before moving it to a new server.

The current repository contains the source code for all of our software projects but it also contains relatively large binary files of no significance such as:

  • Full installers for a number of 3rd party tools.
  • .jpg & .png files (which are unmodified exports of PSDs that live in the same folder).
  • Bin and Obj folders (which are then 'svn ignored' the next commit).
  • Resharper directories.

A number of these large files have been 'SVN deleted' since they were added, creating a further problem of identifing the biggest offenders.

I want to either:

  • Create a new SVN repository that contains only the code for all of the software projects – it is really important that the copied files maintain their SVN history from the old repository.
  • Remove the large binary commits and files from the existing repository.

Are either of these possible?

Best Answer

Otherside is right about svnadmin dump, etc. Something like this will get you a rough pointer to revisions that added lots of data to your repo, and are candidates for svndumpfilter:

for r in `svn log -q | grep ^r | cut -d ' ' -f 1 | tr -d r`; do
   echo "revision $r is " `svn diff -c $r | wc -c` " bytes";
done

You could also try something like this to find revisions that added files with a particular extension (here, .jpg):

svn log -vq | egrep "^r|\.jpg$" | grep -B 1 "\.jpg$"
Related Topic