Git – Why does git perform fast-forward merges by default

fast-forwardgitgit-branchgit-merge

Coming from mercurial, I use branches to organize features.
Naturally, I want to see this work-flow in my history as well.

I started my new project using git and finished my first feature. When merging the feature, I realized git uses fast-forward, i.e. it applies my changes directly to the master branch if possible and forgets about my branch.

So to think into the future: I'm the only one working on this project. If I use git's default approach (fast-forward merging), my history would result in one giant master branch.
Nobody knows I used a separate branch for every feature, because in the end I'll have only that giant master branch. Won't that look unprofessional?

By this reasoning, I don't want fast-forward merging and can't see why it is the default. What's so good about it?

Best Answer

Fast-forward merging makes sense for short-lived branches, but in a more complex history, non-fast-forward merging may make the history easier to understand, and make it easier to revert a group of commits.

Warning: Non-fast-forwarding has potential side effects as well. Please review https://sandofsky.com/blog/git-workflow.html, avoid the 'no-ff' with its "checkpoint commits" that break bisect or blame, and carefully consider whether it should be your default approach for master.

alt text
^{(From nvie.com, Vincent Driessen, post "A successful Git branching model")}

Incorporating a finished feature on develop

Finished features may be merged into the develop branch to add them to the upcoming release:

$ git checkout develop
Switched to branch 'develop'
$ git merge --no-ff myfeature
Updating ea1b82a..05e9557
(Summary of changes)
$ git branch -d myfeature
Deleted branch myfeature (was 05e9557).
$ git push origin develop

The --no-ff flag causes the merge to always create a new commit object, even if the merge could be performed with a fast-forward. This avoids losing information about the historical existence of a feature branch and groups together all commits that together added the feature.

Jakub Narębski also mentions the config merge.ff:

By default, Git does not create an extra merge commit when merging a commit that is a descendant of the current commit. Instead, the tip of the current branch is fast-forwarded.
When set to false, this variable tells Git to create an extra merge commit in such a case (equivalent to giving the --no-ff option from the command line).
When set to 'only', only such fast-forward merges are allowed (equivalent to giving the --ff-only option from the command line).

The fast-forward is the default because:

short-lived branches are very easy to create and use in Git
short-lived branches often isolate many commits that can be reorganized freely within that branch
those commits are actually part of the main branch: once reorganized, the main branch is fast-forwarded to include them.

But if you anticipate an iterative workflow on one topic/feature branch (i.e., I merge, then I go back to this feature branch and add some more commits), then it is useful to include only the merge in the main branch, rather than all the intermediate commits of the feature branch.

In this case, you can end up setting this kind of config file:

[branch "master"]
# This is the list of cmdline options that should be added to git-merge 
# when I merge commits into the master branch.

# The option --no-commit instructs git not to commit the merge
# by default. This allows me to do some final adjustment to the commit log
# message before it gets commited. I often use this to add extra info to
# the merge message or rewrite my local branch names in the commit message
# to branch names that are more understandable to the casual reader of the git log.

# Option --no-ff instructs git to always record a merge commit, even if
# the branch being merged into can be fast-forwarded. This is often the
# case when you create a short-lived topic branch which tracks master, do
# some changes on the topic branch and then merge the changes into the
# master which remained unchanged while you were doing your work on the
# topic branch. In this case the master branch can be fast-forwarded (that
# is the tip of the master branch can be updated to point to the tip of
# the topic branch) and this is what git does by default. With --no-ff
# option set, git creates a real merge commit which records the fact that
# another branch was merged. I find this easier to understand and read in
# the log.

mergeoptions = --no-commit --no-ff

The OP adds in the comments:

I see some sense in fast-forward for [short-lived] branches, but making it the default action means that git assumes you... often have [short-lived] branches. Reasonable?

Jefromi answers:

I think the lifetime of branches varies greatly from user to user. Among experienced users, though, there's probably a tendency to have far more short-lived branches.

To me, a short-lived branch is one that I create in order to make a certain operation easier (rebasing, likely, or quick patching and testing), and then immediately delete once I'm done.
That means it likely should be absorbed into the topic branch it forked from, and the topic branch will be merged as one branch. No one needs to know what I did internally in order to create the series of commits implementing that given feature.

More generally, I add:

it really depends on your development workflow:

if it is linear, one branch makes sense.

If you need to isolate features and work on them for a long period of time and repeatedly merge them, several branches make sense.

See "When should you branch?"

Actually, when you consider the Mercurial branch model, it is at its core one branch per repository (even though you can create anonymous heads, bookmarks and even named branches)
See "Git and Mercurial - Compare and Contrast".

Mercurial, by default, uses anonymous lightweight codelines, which in its terminology are called "heads".
Git uses lightweight named branches, with injective mapping to map names of branches in remote repository to names of remote-tracking branches.
Git "forces" you to name branches (well, with the exception of a single unnamed branch, which is a situation called a "detached HEAD"), but I think this works better with branch-heavy workflows such as topic branch workflow, meaning multiple branches in a single repository paradigm.

Recursive

Recursive is the default for non-fast-forward merges. We're all familiar with that one.

Octopus

I've used octopus when I've had several trees that needed to be merged. You see this in larger projects where many branches have had independent development and it's all ready to come together into a single head.

An octopus branch merges multiple heads in one commit as long as it can do it cleanly.

For illustration, imagine you have a project that has a master, and then three branches to merge in (call them a, b, and c).

A series of recursive merges would look like this (note that the first merge was a fast-forward, as I didn't force recursion):

However, a single octopus merge would look like this:

commit ae632e99ba0ccd0e9e06d09e8647659220d043b9
Merge: f51262e... c9ce629... aa0f25d...

Ours

Ours == I want to pull in another head, but throw away all of the changes that head introduces.

This keeps the history of a branch without any of the effects of the branch.

(Read: It is not even looked at the changes between those branches. The branches are just merged and nothing is done to the files. If you want to merge in the other branch and every time there is the question "our file version or their version" you can use git merge -X ours)

Subtree

Subtree is useful when you want to merge in another project into a subdirectory of your current project. Useful when you have a library you don't want to include as a submodule.

Git workflow and rebase vs merge questions

TL;DR

A git rebase workflow does not protect you from people who are bad at conflict resolution or people who are used to a SVN workflow, like suggested in Avoiding Git Disasters: A Gory Story. It only makes conflict resolution more tedious for them and makes it harder to recover from bad conflict resolution. Instead, use diff3 so that it's not so difficult in the first place.

Rebase workflow is not better for conflict resolution!

I am very pro-rebase for cleaning up history. However if I ever hit a conflict, I immediately abort the rebase and do a merge instead! It really kills me that people are recommending a rebase workflow as a better alternative to a merge workflow for conflict resolution (which is exactly what this question was about).

If it goes "all to hell" during a merge, it will go "all to hell" during a rebase, and potentially a lot more hell too! Here's why:

Reason #1: Resolve conflicts once, instead of once for each commit

When you rebase instead of merge, you will have to perform conflict resolution up to as many times as you have commits to rebase, for the same conflict!

Real scenario

I branch off of master to refactor a complicated method in a branch. My refactoring work is comprised of 15 commits total as I work to refactor it and get code reviews. Part of my refactoring involves fixing the mixed tabs and spaces that were present in master before. This is necessary, but unfortunately it will conflict with any change made afterward to this method in master. Sure enough, while I'm working on this method, someone makes a simple, legitimate change to the same method in the master branch that should be merged in with my changes.

When it's time to merge my branch back with master, I have two options:

git merge: I get a conflict. I see the change they made to master and merge it in with (the final product of) my branch. Done.

git rebase: I get a conflict with my first commit. I resolve the conflict and continue the rebase. I get a conflict with my second commit. I resolve the conflict and continue the rebase. I get a conflict with my third commit. I resolve the conflict and continue the rebase. I get a conflict with my fourth commit. I resolve the conflict and continue the rebase. I get a conflict with my fifth commit. I resolve the conflict and continue the rebase. I get a conflict with my sixth commit. I resolve the conflict and continue the rebase. I get a conflict with my seventh commit. I resolve the conflict and continue the rebase. I get a conflict with my eighth commit. I resolve the conflict and continue the rebase. I get a conflict with my ninth commit. I resolve the conflict and continue the rebase. I get a conflict with my tenth commit. I resolve the conflict and continue the rebase. I get a conflict with my eleventh commit. I resolve the conflict and continue the rebase. I get a conflict with my twelfth commit. I resolve the conflict and continue the rebase. I get a conflict with my thirteenth commit. I resolve the conflict and continue the rebase. I get a conflict with my fourteenth commit. I resolve the conflict and continue the rebase. I get a conflict with my fifteenth commit. I resolve the conflict and continue the rebase.

You have got to be kidding me if this is your preferred workflow. All it takes is a whitespace fix that conflicts with one change made on master, and every commit will conflict and must be resolved. And this is a simple scenario with only a whitespace conflict. Heaven forbid you have a real conflict involving major code changes across files and have to resolve that multiple times.

With all the extra conflict resolution you need to do, it just increases the possibility that you will make a mistake. But mistakes are fine in git since you can undo, right? Except of course...

Reason #2: With rebase, there is no undo!

I think we can all agree that conflict resolution can be difficult, and also that some people are very bad at it. It can be very prone to mistakes, which why it's so great that git makes it easy to undo!

When you merge a branch, git creates a merge commit that can be discarded or amended if the conflict resolution goes poorly. Even if you have already pushed the bad merge commit to the public/authoritative repo, you can use git revert to undo the changes introduced by the merge and redo the merge correctly in a new merge commit.

When you rebase a branch, in the likely event that conflict resolution is done wrong, you're screwed. Every commit now contains the bad merge, and you can't just redo the rebase*. At best, you have to go back and amend each of the affected commits. Not fun.

After a rebase, it's impossible to determine what was originally part of the commits and what was introduced as a result of bad conflict resolution.

_{*It can be possible to undo a rebase if you can dig the old refs out of git's internal logs, or if you create a third branch that points to the last commit before rebasing.}

Take the hell out of conflict resolution: use diff3

Take this conflict for example:

<<<<<<< HEAD
TextMessage.send(:include_timestamp => true)
=======
EmailMessage.send(:include_timestamp => false)
>>>>>>> feature-branch

Looking at the conflict, it's impossible to tell what each branch changed or what its intent was. This is the biggest reason in my opinion why conflict resolution is confusing and hard.

diff3 to the rescue!

git config --global merge.conflictstyle diff3

When you use the diff3, each new conflict will have a 3rd section, the merged common ancestor.

<<<<<<< HEAD
TextMessage.send(:include_timestamp => true)
||||||| merged common ancestor
EmailMessage.send(:include_timestamp => true)
=======
EmailMessage.send(:include_timestamp => false)
>>>>>>> feature-branch

First examine the merged common ancestor. Then compare each side to determine each branch's intent. You can see that HEAD changed EmailMessage to TextMessage. Its intent is to change the class used to TextMessage, passing the same parameters. You can also see that feature-branch's intent is to pass false instead of true for the :include_timestamp option. To merge these changes, combine the intent of both:

TextMessage.send(:include_timestamp => false)

In general:

Compare the common ancestor with each branch, and determine which branch has the simplest change
Apply that simple change to the other branch's version of the code, so that it contains both the simpler and the more complex change
Remove all the sections of conflict code other than the one that you just merged the changes together into

Alternate: Resolve by manually applying the branch's changes

Finally, some conflicts are terrible to understand even with diff3. This happens especially when diff finds lines in common that are not semantically common (eg. both branches happened to have a blank line at the same place!). For example, one branch changes the indentation of the body of a class or reorders similar methods. In these cases, a better resolution strategy can be to examine the change from either side of the merge and manually apply the diff to the other file.

Let's look at how we might resolve a conflict in a scenario where merging origin/feature1 where lib/message.rb conflicts.

Decide whether our currently checked out branch (HEAD, or --ours) or the branch we're merging (origin/feature1, or --theirs) is a simpler change to apply. Using diff with triple dot (git diff a...b) shows the changes that happened on b since its last divergence from a, or in other words, compare the common ancestor of a and b with b.
```
git diff HEAD...origin/feature1 -- lib/message.rb # show the change in feature1
git diff origin/feature1...HEAD -- lib/message.rb # show the change in our branch
```

Check out the more complicated version of the file. This will remove all conflict markers and use the side you choose.

git checkout --ours -- lib/message.rb   # if our branch's change is more complicated
git checkout --theirs -- lib/message.rb # if origin/feature1's change is more complicated

With the complicated change checked out, pull up the diff of the simpler change (see step 1). Apply each change from this diff to the conflicting file.