Hmm, not sure I agree with Nick re tag being similar to a branch. A tag is just a marker
Trunk would be the main body of development, originating from the start of the project until the present.
Branch will be a copy of code derived from a certain point in the trunk that is used for applying major changes to the code while preserving the integrity of the code in the trunk. If the major changes work according to plan, they are usually merged back into the trunk.
Tag will be a point in time on the trunk or a branch that you wish to preserve. The two main reasons for preservation would be that either this is a major release of the software, whether alpha, beta, RC or RTM, or this is the most stable point of the software before major revisions on the trunk were applied.
In open source projects, major branches that are not accepted into the trunk by the project stakeholders can become the bases for forks -- e.g., totally separate projects that share a common origin with other source code.
The branch and tag subtrees are distinguished from the trunk in the following ways:
Subversion allows sysadmins to create hook scripts which are triggered for execution when certain events occur; for instance, committing a change to the repository. It is very common for a typical Subversion repository implementation to treat any path containing "/tag/" to be write-protected after creation; the net result is that tags, once created, are immutable (at least to "ordinary" users). This is done via the hook scripts, which enforce the immutability by preventing further changes if tag is a parent node of the changed object.
Subversion also has added features, since version 1.5, relating to "branch merge tracking" so that changes committed to a branch can be merged back into the trunk with support for incremental, "smart" merging.
Create a users file (i.e. users.txt
) for mapping SVN users to Git:
user1 = First Last Name <email@address.com>
user2 = First Last Name <email@address.com>
...
You can use this one-liner to build a template from your existing SVN repository:
svn log -q | awk -F '|' '/^r/ {gsub(/ /, "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > users.txt
SVN will stop if it finds a missing SVN user, not in the file. But after that, you can update the file and pick up where you left off.
Now pull the SVN data from the repository:
git svn clone --stdlayout --no-metadata --authors-file=users.txt svn://hostname/path dest_dir-tmp
This command will create a new Git repository in dest_dir-tmp
and start pulling the SVN repository. Note that the "--stdlayout" flag implies you have the common "trunk/, branches/, tags/" SVN layout. If your layout differs, become familiar with --tags
, --branches
, --trunk
options (in general git svn help
).
All common protocols are allowed: svn://
, http://
, https://
. The URL should target the base repository, something like http://svn.mycompany.com/myrepo/repository. The URL string must not include /trunk
, /tag
or /branches
.
Note that after executing this command it very often looks like the operation is "hanging/frozen", and it's quite normal that it can be stuck for a long time after initializing the new repository. Eventually, you will then see log messages which indicate that it's migrating.
Also note that if you omit the --no-metadata
flag, Git will append information about the corresponding SVN revision to the commit message (i.e. git-svn-id: svn://svn.mycompany.com/myrepo/<branchname/trunk>@<RevisionNumber> <Repository UUID>
)
If a user name is not found, update your users.txt
file then:
cd dest_dir-tmp
git svn fetch
You might have to repeat that last command several times, if you have a large project until all of the Subversion commits have been fetched:
git svn fetch
When completed, Git will checkout the SVN trunk
into a new branch. Any other branches are set up as remotes. You can view the other SVN branches with:
git branch -r
If you want to keep other remote branches in your repository, you want to create a local branch for each one manually. (Skip trunk/master.) If you don't do this, the branches won't get cloned in the final step.
git checkout -b local_branch remote_branch
# It's OK if local_branch and remote_branch are the same names
Tags are imported as branches. You have to create a local branch, make a tag and delete the branch to have them as tags in Git. To do it with tag "v1":
git checkout -b tag_v1 remotes/tags/v1
git checkout master
git tag v1 tag_v1
git branch -D tag_v1
Clone your GIT-SVN repository into a clean Git repository:
git clone dest_dir-tmp dest_dir
rm -rf dest_dir-tmp
cd dest_dir
The local branches that you created earlier from remote branches will only have been copied as remote branches into the newly cloned repository. (Skip trunk/master.) For each branch you want to keep:
git checkout -b local_branch origin/remote_branch
Finally, remove the remote from your clean Git repository that points to the now-deleted temporary repository:
git remote rm origin
Best Answer
Ok, let me just check the scenario - I guess you're trying to isolate the developers from the trunk entirely so that no-one commits directly to trunk. I assume your platform automatically creates these branches and the developers can at some point say that they're finished with the branch and have it merged back in. So the code follows the cycle:
I'm actually all for this, if it's the way you want to work, great.
I can't think of any reason why it would fail outright, but you must check the following things:
Point 1 is the most important. This means that when re-integration occurs, it must effectively be a no-op. If any sort of merging happens in your 2.2 step, I'd say you must throw the commit away and make the developer re-base from trunk again. Even if svn says the merge says is successful and without conflicts, you just can't trust it to have done the right thing enough to fire and forget. If you're going to automate, guarantee that what the developer committed is what gets merged, not some auto-generated hybrid. You can check if the merge was 'clean' just by looking at the output of the merge - if any of files were 'merged' then there's a problem. If they were just updated, then you're ok.
Point 3 is interesting but is always a problem, even in normal working. In this scenario, developer A would say they're finished, re-base their working copy then spend a while checking that everything works ok. In the mean time, developer B sneaks in an update to trunk. Developer A decides that everything is ok and re-integrates. However, changes made by developer B mean that code goes screwy even though it didn't touch any files modified by dev A. Since dev A was the last to commit, they get blamed.
The assumption is that if dev A had included dev B's changes, then he would have spotted the problem. Your platform has the opportunity to spot this situation (for example, if svn-merginfo says there are trunk revisions that can be committed to the branch, then it's not up to date). However I'd also caution against creating an eternal merge cycle where it's not necessary. Perhaps give the developers a warning that a commit has been made since they re-based, but allow them to go ahead anyway.
One last note: I mentioned using svn-mergeinfo above. You cant rely on this to assume that a re-integrate merge will be ok. If you do, there'll be a race condition between making the check and committing the merge - someone could have got in in the mean time. You still need to check the output of the actual merge command to see what really happened.
Also be aware of the situation if multiple devs try to re-integrate at the same time. If you check out a fresh working copy each time, you can have as many as you like, but the commits may well fail with the old "file out of date" type error. In these situations, again, the re-integration will fail and you'll have to get the developer to re-base their branches.
All in all I like it. There's a fair bit of complication in there, but at the end of the day, that's the point of this kind of system - it deals with the complex stuff so the devs don't have to. All they have to do is keep re-basing their branches until the system lets them re-integrate successfully.