Because I'm currently struggling to learn IBM Rational ClearCase, I'd like to hear your professional opinion.
I'm particularly interested in advantages/disadvantages compared to other version-control-systems like Subversion or Git.
clearcaseversion control
Because I'm currently struggling to learn IBM Rational ClearCase, I'd like to hear your professional opinion.
I'm particularly interested in advantages/disadvantages compared to other version-control-systems like Subversion or Git.
I've been using Mercurial both at work and in my own personal projects, and I am really happy with it. The advantages I see are:
Of course, like any new system, there was some pain during the transition. You have to think about version control differently than you did when you were using SVN, but overall I think it's very much worth it.
A git rebase workflow does not protect you from people who are bad at conflict resolution or people who are used to a SVN workflow, like suggested in Avoiding Git Disasters: A Gory Story. It only makes conflict resolution more tedious for them and makes it harder to recover from bad conflict resolution. Instead, use diff3 so that it's not so difficult in the first place.
I am very pro-rebase for cleaning up history. However if I ever hit a conflict, I immediately abort the rebase and do a merge instead! It really kills me that people are recommending a rebase workflow as a better alternative to a merge workflow for conflict resolution (which is exactly what this question was about).
If it goes "all to hell" during a merge, it will go "all to hell" during a rebase, and potentially a lot more hell too! Here's why:
When you rebase instead of merge, you will have to perform conflict resolution up to as many times as you have commits to rebase, for the same conflict!
I branch off of master to refactor a complicated method in a branch. My refactoring work is comprised of 15 commits total as I work to refactor it and get code reviews. Part of my refactoring involves fixing the mixed tabs and spaces that were present in master before. This is necessary, but unfortunately it will conflict with any change made afterward to this method in master. Sure enough, while I'm working on this method, someone makes a simple, legitimate change to the same method in the master branch that should be merged in with my changes.
When it's time to merge my branch back with master, I have two options:
git merge: I get a conflict. I see the change they made to master and merge it in with (the final product of) my branch. Done.
git rebase: I get a conflict with my first commit. I resolve the conflict and continue the rebase. I get a conflict with my second commit. I resolve the conflict and continue the rebase. I get a conflict with my third commit. I resolve the conflict and continue the rebase. I get a conflict with my fourth commit. I resolve the conflict and continue the rebase. I get a conflict with my fifth commit. I resolve the conflict and continue the rebase. I get a conflict with my sixth commit. I resolve the conflict and continue the rebase. I get a conflict with my seventh commit. I resolve the conflict and continue the rebase. I get a conflict with my eighth commit. I resolve the conflict and continue the rebase. I get a conflict with my ninth commit. I resolve the conflict and continue the rebase. I get a conflict with my tenth commit. I resolve the conflict and continue the rebase. I get a conflict with my eleventh commit. I resolve the conflict and continue the rebase. I get a conflict with my twelfth commit. I resolve the conflict and continue the rebase. I get a conflict with my thirteenth commit. I resolve the conflict and continue the rebase. I get a conflict with my fourteenth commit. I resolve the conflict and continue the rebase. I get a conflict with my fifteenth commit. I resolve the conflict and continue the rebase.
You have got to be kidding me if this is your preferred workflow. All it takes is a whitespace fix that conflicts with one change made on master, and every commit will conflict and must be resolved. And this is a simple scenario with only a whitespace conflict. Heaven forbid you have a real conflict involving major code changes across files and have to resolve that multiple times.
With all the extra conflict resolution you need to do, it just increases the possibility that you will make a mistake. But mistakes are fine in git since you can undo, right? Except of course...
I think we can all agree that conflict resolution can be difficult, and also that some people are very bad at it. It can be very prone to mistakes, which why it's so great that git makes it easy to undo!
When you merge a branch, git creates a merge commit that can be discarded or amended if the conflict resolution goes poorly. Even if you have already pushed the bad merge commit to the public/authoritative repo, you can use git revert
to undo the changes introduced by the merge and redo the merge correctly in a new merge commit.
When you rebase a branch, in the likely event that conflict resolution is done wrong, you're screwed. Every commit now contains the bad merge, and you can't just redo the rebase*. At best, you have to go back and amend each of the affected commits. Not fun.
After a rebase, it's impossible to determine what was originally part of the commits and what was introduced as a result of bad conflict resolution.
*It can be possible to undo a rebase if you can dig the old refs out of git's internal logs, or if you create a third branch that points to the last commit before rebasing.
Take this conflict for example:
<<<<<<< HEAD
TextMessage.send(:include_timestamp => true)
=======
EmailMessage.send(:include_timestamp => false)
>>>>>>> feature-branch
Looking at the conflict, it's impossible to tell what each branch changed or what its intent was. This is the biggest reason in my opinion why conflict resolution is confusing and hard.
diff3 to the rescue!
git config --global merge.conflictstyle diff3
When you use the diff3, each new conflict will have a 3rd section, the merged common ancestor.
<<<<<<< HEAD
TextMessage.send(:include_timestamp => true)
||||||| merged common ancestor
EmailMessage.send(:include_timestamp => true)
=======
EmailMessage.send(:include_timestamp => false)
>>>>>>> feature-branch
First examine the merged common ancestor. Then compare each side to determine each branch's intent. You can see that HEAD changed EmailMessage to TextMessage. Its intent is to change the class used to TextMessage, passing the same parameters. You can also see that feature-branch's intent is to pass false instead of true for the :include_timestamp option. To merge these changes, combine the intent of both:
TextMessage.send(:include_timestamp => false)
In general:
Finally, some conflicts are terrible to understand even with diff3. This happens especially when diff finds lines in common that are not semantically common (eg. both branches happened to have a blank line at the same place!). For example, one branch changes the indentation of the body of a class or reorders similar methods. In these cases, a better resolution strategy can be to examine the change from either side of the merge and manually apply the diff to the other file.
Let's look at how we might resolve a conflict in a scenario where merging origin/feature1
where lib/message.rb
conflicts.
Decide whether our currently checked out branch (HEAD
, or --ours
) or the branch we're merging (origin/feature1
, or --theirs
) is a simpler change to apply. Using diff with triple dot (git diff a...b
) shows the changes that happened on b
since its last divergence from a
, or in other words, compare the common ancestor of a and b with b.
git diff HEAD...origin/feature1 -- lib/message.rb # show the change in feature1
git diff origin/feature1...HEAD -- lib/message.rb # show the change in our branch
Check out the more complicated version of the file. This will remove all conflict markers and use the side you choose.
git checkout --ours -- lib/message.rb # if our branch's change is more complicated
git checkout --theirs -- lib/message.rb # if origin/feature1's change is more complicated
With the complicated change checked out, pull up the diff of the simpler change (see step 1). Apply each change from this diff to the conflicting file.
Best Answer
You can find a good comparison between ClearCase and Git in my SO answer:
"What are the basic ClearCase concepts every developer should know?", illustrating some major differences (and some shortcomings of ClearCase)
File-centric operations
The most single important shortcoming of ClearCase is its old "file-centric" approach (as opposed to "repository-centric" like in SVN or Git or Perforce...)
That means each checkout or check-in is done file per file. The atomicity of operation is at file levels.
Combine that with a very verbose protocol and a network with potentially several nodes between the developer workstation and the VOB server, and you can end up with a fairly slow and inefficient file server (which ClearCase is at its core).
File-per-file operations means: slow recursive operations (like recursive checkout or recursive "add to source control", even by
clearfsimport
).A fast LAN is mandatory to mitigate the side-effects of that chatty protocol.
Centralized VCS
The other aspect to take into account is its centralized aspect (even though it can be "distributed" with its multi-site replicated VOB feature)
If the network does not allow access to the VOBs, the developers can:
Expensive Distributed VCS option
You can have some distributed VCS feature by replicating a Vob.
But:
Old and not user-friendly GUI
UCM inconsistencies and in coherencies
As mentioned in "How to Leverage ClearCase’s features", dynamic views are great (a way to see data through the network without having to copy them to the disk), but the main feature remain UCM: it can be a real asset if you have big project with complex workflow.
Some shortcomings on that front:
Limited policies with Base ClearCase
Using ClearCase without using UCM means having to define a policy to:
cleartool find
" requests...No application rights
ClearCase right management is entirely built on system rights.
That means you need to register your user to the correct system group, which is not always easy to do when you have to enter a ticket to your IT service in order for them to make the proper registration.
Add to that an heterogeneous environment (users on Windows, and server on Unix), and you need to register your user on Unix as well as Windows! (with the same login/group name). Unless you put some sort of LDAP correspondence between the two world (like Centrify)
No advanced API
cleartool
" is the ClearCase Command Line Interface), meaning that any script (in Perl or other language) consists in parsing the output of thosecleartool
commands)View Storages not easily centralized/backed up
The View storages are the equivalent of the ".svn" of SubVersion, exept there is only one "view storage" per view instead of many .svn in all the directories of a SubVersion workspace. That is good.
What is bad is that each operations within a view (a simple "
ls
", checkout, checking, ...) will trigger a network request to the view_server process that manages your view server.2 options:
The first mode means: you have to backup yourself your work in progress (private files or checked-out files)
The second mode means: your workstation can be unavailable, you can just log on another a get back your views (execpt for the private files of a snapshot view)
Side discussion about dynamic views:
To add to the "dynamic view" aspect, it has one advantage (it's dynamic) and one shortcoming (it's dynamic).
Dynamic views are great for setting a simple environment to quickly share a small development between a small team: for a small development effort, a dynamic view can help 2 or 3 developers to constantly stay in touch one with another, seeing instantly when one's commit breaks something in the other views.
For more complex development effort, the artificial "isolation" provided by snapshot view is preferable (you see changes only when you refresh - or "update" - your snapshot view)
For real divergent development effort or course, a branch is still required to achieve true code isolation (merges will be required at some point, which ClearCase handles very well, albeit slowly, file-by-file)
The point is, you can use both, for the right reasons.
Note: by small team I do not mean "small project". ClearCase is best used for large project, but if you want to use dynamic views, you need to setup up "task branches in order to isolate a small development effort per branch: that way a "small team" (a subset of your large team) can work efficiently, sharing quickly its work between its members.
If you use dynamic views on a "main" branch where everyone is doing anything, then any check-in would "kill you" as it could introduced some "build breaks" unrelated with your current development effort.
That would then be a poor usage of dynamic views, and that would forget its other usages:
Developing directly in a dynamic view is not always the best option since all (non-checked-out) files are read over the network.
That means the dll or jar or exe needed by your IDE would be accessed over the network, which can slow down considerably the compilation process.
Possible solutions: