Infinite Branch Merges Policy – Alternatives in Version Control

development-processgitversion control

My office is trying to figure out how we handle branch splits and merges, and we've run into a big problem.

Our issue is with long-term sidebranches — the kind where you've got a few people working a sidebranch that splits from master, we develop for a few months, and when we reach a milestone we sync the two up.

Now, IMHO, the natural way to handle this is, squash the sidebranch into a single commit. master keeps progressing forward; as it should – we're not retroactively dumping months of parallel development into master's history. And if anybody needs better resolution for the sidebranch's history, well, of course it's all still there — it's just not in master, it's in the sidebranch.

Here's the problem: I work exclusively with the command line, but the rest of my team uses GUIS. And I've discovered the GUIS don't have a reasonable option to display history from other branches. So if you reach a squash commit, saying "this development squashed from branch XYZ", it's a huge pain to go see what's in XYZ.

On SourceTree, as far as I'm able to find, it's a huge headache: If you're on master, and you want to see the history from master+devFeature , you either need to check master+devFeature out (touching every single file that's different), or else scroll through a log displaying ALL your repository's branches in parallel until you find the right place. And good luck figuring out where you are there.

My teammates, quite rightly, do not want to have development history so inaccessible. So they want these big, long development-sidebranches merged in, always with a merge commit. They don't want any history that isn't immediately accessible from the master branch.

I hate that idea; it means an endless, unnavigable tangle of parallel development history. But I'm not seeing what alternative we have. And I'm pretty baffled; this seems to block off most everything I know about good branch management, and it's going to be a constant frustration to me if I can't find a solution.

Do we have any option here besides constantly merging sidebranches into master with merge-commits? Or, is there a reason that constantly using merge-commits is not as bad as I fear?

Best Answer

Even though I use Git on the command line – I have to agree with your colleagues. It is not sensible to squash large changes into a single commit. You are losing history that way, not just making it less visible.

The point of source control is to track the history of all changes. When did what change why? To that end, every commit contains pointers to parent commits, a diff, and metadata like a commit message. Each commit describes the state of the source code and the complete history of all changes that led up to that state. The garbage collector may delete commits that are not reachable.

Actions like rebasing, cherry-picking, or squashing delete or rewrite history. In particular, the resulting commits no longer reference the original commits. Consider this:

  • You squash some commits and note in the commit message that the squashed history is available in original commit abcd123.
  • You delete[1] all branches or tags that include abcd123 since they are merged.
  • You let the garbage collector run.

[1]: Some Git servers allow branches to be protected against accidental deletion, but I doubt you want to keep all your feature branches for eternity.

Now you can no longer look up that commit – it just doesn't exist.

Referencing a branch name in a commit message is even worse, since branch names are local to a repo. What is master+devFeature in your local checkout might be doodlediduh in mine. Branches are just moving labels that point to some commit object.

Of all history rewriting techniques, rebasing is the most benign because it duplicates the complete commits with all their history, and just replaces a parent commit.

That the master history includes the complete history of all branches that were merged into it is a good thing, because that represents reality.[2] If there was parallel development, that should be visible in the log.

[2]: For this reason, I also prefer explicit merge commits over the linearized but ultimately fake history resulting from rebasing.

On the command line, git log tries hard to simplify the displayed history and keep all displayed commits relevant. You can tweak history simplification to suit your needs. You might be tempted to write your own git log tool that walks the commit graph, but it is generally impossible to answer “was this commit originally committed on this or that branch?”. The first parent of a merge commit is the previous HEAD, i.e. the commit in the branch that you are merging into. But that assumes that you didn't do a reverse merge from master into the feature branch, then fast-forwarded master to the merge.

The best solution to long-term branches I've encountered is to prevent branches that are only merged after a couple of months. Merging is easiest when the changes are recent and small. Ideally, you'll merge at least once per week. Continuous integration (as in Extreme Programming, not as in “let's set up a Jenkins server”), even suggest multiple merges per day, i.e. not to maintain separate feature branches but share a development branch as a team. Merging before a feature is QA'd requires that the feature is hidden behind a feature flag.

In return, frequent integration makes it possible to spot potential problems much earlier, and helps to keep a consistent architecture: far reaching changes are possible because these changes are quickly included in all branches. If a change breaks some code, it will only break a couple of days work, not a couple of months.

History rewriting can make sense for truly huge projects when there are multiple millions lines of code and hundreds or thousands of active developers. It is questionable why such a large project would have to be a single git repo instead of being divided into separate libraries, but at that scale it is more convenient if the central repo only contains “releases“ of the individual components. E.g. the Linux kernel employs squashing to keep the main history manageable. Some open source projects require patches to be sent via email, instead of a git-level merge.