Create a users file (i.e. users.txt
) for mapping SVN users to Git:
user1 = First Last Name <email@address.com>
user2 = First Last Name <email@address.com>
...
You can use this one-liner to build a template from your existing SVN repository:
svn log -q | awk -F '|' '/^r/ {gsub(/ /, "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > users.txt
SVN will stop if it finds a missing SVN user, not in the file. But after that, you can update the file and pick up where you left off.
Now pull the SVN data from the repository:
git svn clone --stdlayout --no-metadata --authors-file=users.txt svn://hostname/path dest_dir-tmp
This command will create a new Git repository in dest_dir-tmp
and start pulling the SVN repository. Note that the "--stdlayout" flag implies you have the common "trunk/, branches/, tags/" SVN layout. If your layout differs, become familiar with --tags
, --branches
, --trunk
options (in general git svn help
).
All common protocols are allowed: svn://
, http://
, https://
. The URL should target the base repository, something like http://svn.mycompany.com/myrepo/repository. The URL string must not include /trunk
, /tag
or /branches
.
Note that after executing this command it very often looks like the operation is "hanging/frozen", and it's quite normal that it can be stuck for a long time after initializing the new repository. Eventually, you will then see log messages which indicate that it's migrating.
Also note that if you omit the --no-metadata
flag, Git will append information about the corresponding SVN revision to the commit message (i.e. git-svn-id: svn://svn.mycompany.com/myrepo/<branchname/trunk>@<RevisionNumber> <Repository UUID>
)
If a user name is not found, update your users.txt
file then:
cd dest_dir-tmp
git svn fetch
You might have to repeat that last command several times, if you have a large project until all of the Subversion commits have been fetched:
git svn fetch
When completed, Git will checkout the SVN trunk
into a new branch. Any other branches are set up as remotes. You can view the other SVN branches with:
git branch -r
If you want to keep other remote branches in your repository, you want to create a local branch for each one manually. (Skip trunk/master.) If you don't do this, the branches won't get cloned in the final step.
git checkout -b local_branch remote_branch
# It's OK if local_branch and remote_branch are the same names
Tags are imported as branches. You have to create a local branch, make a tag and delete the branch to have them as tags in Git. To do it with tag "v1":
git checkout -b tag_v1 remotes/tags/v1
git checkout master
git tag v1 tag_v1
git branch -D tag_v1
Clone your GIT-SVN repository into a clean Git repository:
git clone dest_dir-tmp dest_dir
rm -rf dest_dir-tmp
cd dest_dir
The local branches that you created earlier from remote branches will only have been copied as remote branches into the newly cloned repository. (Skip trunk/master.) For each branch you want to keep:
git checkout -b local_branch origin/remote_branch
Finally, remove the remote from your clean Git repository that points to the now-deleted temporary repository:
git remote rm origin
If you follow my recommendations below (I have for years), you will be able to:
-- put each project anywhere in source control, as long as you preserve the structure from the project root directory on down
-- build each project anywhere on any machine, with minimum risk and minimum preparation
-- build each project completely stand-alone, as long as you have access to its binary dependencies (local "library" and "output" directories)
-- build and work with any combination of projects, since they are independent
-- build and work with multiple copies/versions of a single project, since they are independent
-- avoid cluttering your source control repository with generated files or libraries
I recommend (here's the beef):
Define each project to produce a single primary deliverable, such as an .DLL, .EXE, or .JAR (default with Visual Studio).
Structure each project as a directory tree with a single root.
Create an automated build script for each project in its root directory that will build it from scratch, with NO dependencies on an IDE (but don't prevent it from being built in the IDE, if feasible).
Consider nAnt for .NET projects on Windows, or something similar based on your OS, target platform, etc.
Make every project build script reference its external (3rd-party) dependencies from a single local shared "library" directory, with every such binary FULLY identified by version: %DirLibraryRoot%\ComponentA-1.2.3.4.dll
, %DirLibraryRoot%\ComponentB-5.6.7.8.dll
.
Make every project build script publish the primary deliverable to a single local shared "output" directory: %DirOutputRoot%\ProjectA-9.10.11.12.dll
, %DirOutputRoot%\ProjectB-13.14.15.16.exe
.
Make every project build script reference its dependencies via configurable and fully-versioned absolute paths (see above) in the "library" and "output" directories, AND NO WHERE ELSE.
NEVER let a project directly reference another project or any of its contents--only allow references to the primary deliverables in the "output" directory (see above).
Make every project build script reference its required build tools by a configurable and fully-versioned absolute path: %DirToolRoot%\ToolA\1.2.3.4
, %DirToolRoot%\ToolB\5.6.7.8
.
Make every project build script reference source content by an absolute path relative to the project root directory: ${project.base.dir}/src
, ${project.base.dir}/tst
(syntax varies by build tool).
ALWAYS require a project build script to reference EVERY file or directory via an absolute, configurable path (rooted at a directory specified by a configurable variable): ${project.base.dir}/some/dirs
or ${env.Variable}/other/dir
.
NEVER allow a project build script to reference ANYTHING with a relative path like .\some\dirs\here
or ..\some\more\dirs
, ALWAYS use absolute paths.
NEVER allow a project build script to reference ANYTHING using an absolute path that does not have a configurable root directory, like C:\some\dirs\here
or \\server\share\more\stuff\there
.
For each configurable root directory referenced by a project build script, define an environment variable that will be used for those references.
Attempt to minimize the number of environment variables you must create to configure each machine.
On each machine, create a shell script that defines the necessary environment variables, which is specific to THAT machine (and possibly specific to that user, if relevant).
Do NOT put the machine-specific configuration shell script into source control; instead, for each project, commit a copy of the script in the project root directory as a template.
REQUIRE each project build script to check each of its environment variables, and abort with a meaningful message if they are not defined.
REQUIRE each project build script to check each of its dependent build tool executables, external library files, and dependent project deliverable files, and abort with a meaningful message if those files do not exist.
RESIST the temptation to commit ANY generated files into source control--no project deliverables, no generated source, no generated docs, etc.
If you use an IDE, generate whatever project control files you can, and don't commit them to source control (this includes Visual Studio project files).
Establish a server with an official copy of all external libraries and tools, to be copied/installed on developer workstations and build machines. Back it up, along with your source control repository.
Establish a continuous integration server (build machine) with NO development tools whatsoever.
Consider a tool for managing your external libraries and deliverables, such as Ivy (used with Ant).
Do NOT use Maven--it will initially make you happy, and eventually make you cry.
Note that none of this is specific to Subversion, and most of it is generic to projects targeted to any OS, hardware, platform, language, etc. I did use a bit of OS- and tool-specific syntax, but only for illustration--I trust that you will translate to your OS or tool of choice.
Additional note regarding Visual Studio solutions: don't put them in source control! With this approach, you don't need them at all or you can generate them (just like the Visual Studio project files). However, I find it best to leave the solution files to individual developers to create/use as they see fit (but not checked in to source control). I keep a Rob.sln
file on my workstation from which I reference my current project(s). Since my projects all stand-alone, I can add/remove projects at will (that means no project-based dependency references).
Please don't use Subversion externals (or similar in other tools), they are an anti-pattern and, therefore, unnecessary.
When you implement continuous integration, or even when you just want to automate the release process, create a script for it. Make a single shell script that: takes parameters of the project name (as listed in the repository) and tag name, creates a temporary directory within a configurable root directory, checks out the source for the given project name and tag name (by constructing the appropriate URL in the case of Subversion) to that temporary directory, performs a clean build that runs tests and packages the deliverable. This shell script should work on any project and should be checked into source control as part of your "build tools" project. Your continuous integration server can use this script as its foundation for building projects, or it might even provide it (but you still might want your own).
@VonC: You do NOT want to work at all times with "ant.jar" rather than "ant-a.b.c.d.jar" after you get burned when your build script breaks because you unknowingly ran it with an incompatible version of Ant. This is particularly common between Ant 1.6.5 and 1.7.0. Generalizing, you ALWAYS want to know what specific version of EVERY component is being used, including your platform (Java A.B.C.D) and your build tool (Ant E.F.G.H). Otherwise, you will eventually encounter a bug and your first BIG problem will be tracking down what versions of your various components are involved. It is simply better to solve that problem up front.
Best Answer
The single vs. multiple issue comes down to personal or organizational preference.
Management of multiple vs. single mainly comes down to access control and maintenance.
Access control for a single repository can be contained in a single file; Multiple repositories are may require multiple files. Maintenance has similar issues - one big backup, or a lot of little backups.
I manage my own. There's one repository, multiple projects, each with its own tags, trunk and branches. If one gets too big or I need to physically isolate a customer's code for their comfort, I can quickly and easily create a new repository.
I recently consulted with a relatively large firm on migrating multiple source code control systems to Subversion. They have ~50 projects, ranging from very small to enterprise applications and their corporate website. Their plan? Start with a single repository, migrate to multiple if necessary. The migration is almost complete and they're still on a single repository, no complaints or issues reported due to it being a single repository.
This isn't a binary, black & white issue.
Do what works for you - were I in your position, I'd combine projects into a single repository as fast as I could type the commands, because the cost would be a major consideration in my (very, very small) company.
JFTR:
revision numbers in Subversion really have no meaning outside the repository. If you need meaningful names for a revision, create a TAG
Commit messages are easily filtered by path in the repository, so reading only those related to a particular project is a trivial exercise.
Edit: See Blade's response for details on using a single authorization/authentication configuration for SVN.