Svn – How to estimate the final size and processing time necessary to dump the entire repo

dumpsvn

Assume an SVN setup that consists of a single repo with nested project folders, something like this:

\
 - trunk
 - - projectA
 - - - trunk
 - - - - ...
 - - - branches
 - - - - ...
 - - projectB
 - - - trunk
 - - - - ...
 - - - branches
 - - - - ...
 - ...

The whole repo is about 40GB on disk, with 17605 commits at last count. I now need to extract a single project from the repo to setup on a separate SVN server, which I understand is possible only by using svnadmin dump on the entire repository and then using svndumpfilter to isolate the project that I need. I fully expect that this will take a really long time to complete the initial dump. Is there a good formula for calculating exactly how long it will take and how much disk space will be required for the final dump file? Also, I've heard that the dump operation uses 100% CPU while it runs. Is that true?

Alternately, is there a better way to go about this given the size of the repo? (Other than just doing an export and losing the revision history.)

Best Answer

Is there a good formula for calculating exactly how long it will take and how much disk space will be required for the final dump file?

To avoid the disk I/O, you can pipe the svndumpfilter result to netcat.

On the old SVN server:

svnadmin dump /path/to/your/repo/ | svndumpfilter --drop-empty-revs --renumber-revs include single_project | nc -l 2302

and on the new SVN server:

svnadmin create single_project
nc IP_address 2302 | svnadmin load single_project

I've tested with my repo (4GB, ~12000 revisions), it takes ~12 minutes to complete.

PS: You can also use gzip to compress the data and ionice to run this with low priority.