Git – Why does git need to remove duplicate objects when merging branches

gitversion control

Git displayed "removing duplicate objects" after I merged a branch to another and noticed my .git/objects folder has significantly reduced (from 40.7 MB to 33.5 MB).

Isn't branch just supposed to be a pointer, so why git needs to delete some objects?

What actually happened there?

Is it simply git's natural behavior?

Best Answer

During garbage collection, git takes the objects from your .git/objects directory, and packs them into the packfile in .git/objects/pack. During this phase, it also compresses the files and takes advantage of the similarity between files to reduce their size. Typically the packfile will increase in size a lot less than the size of the objects that have been moved into it, as it is able to do inter-file optimisations.

Once these objects are in the packfile then there these objects are actually in your local repository twice, once as an object file, and once in the packfile. The "removing duplicate objects" phase, then removes these duplicates from your .git/objects directory, and so the size of the directory is decreased.

Related Solutions

Version Control – Creating a Version Control Strategy for SVN

If you want a unified build process, then be sure to put branches/tags/trunk at the root, like this:

branches/
tags/
trunk/
  dev/
    ...

If you don't need a unified build process, then you can put branches/tags/trunks within each project if you want. However, it might be difficult to migrate to a unified build after having put them within each project. A unified build has advantages, such as eliminating the need to publish shared components among projects -- they're all part of the build.

Personally, I like a unified build process. Furthermore, I don't think you should have a "dev" project. You should have projects directly under trunk, and then branch trunk into a dev branch. Use tags for releases. For example, I would do it like this:

branches/
  dev/
    Site1/
    Site2/
    WebService/
    SharedCode/
tags/
  release1/
    Site1/
    Site2/
    WebService/
    SharedCode/
trunk/
  Site1/
  Site2/
  WebService/
  SharedCode/

Version Control – How to Merge Bug Fixes from Trunk in Old Branches

The Git Flow assumes you only have a single supported release, with the master branch always pointing to the latest release. Since you support multiple releases simultaneously, you cannot copy that workflow 1:1. Nvie's Git Flow is a very good example of a branching strategy, but you must adapt it to your needs. Most importantly, you will have multiple active release branches.

When you identify a bug, you will have to do some testing to determine all affected versions. It is not sufficient to write the fix first, then merge it back into the release branches as a hotfix. Usually, you'll end up with some continuous range of affected versions. Very old versions might not contain the bug, newer versions might have gotten that bug fixed accidentally. You will need to verify the bug on each version so that you can verify it is actually gone after the fix. If you can express the bug as an automated testcase, it's pretty straightforward to find the problematic commit via git bisect, or to run the test for each release:

for release in 3.8 4.1 4.2
do
  git checkout $release
  if ./testcase >release-$release.log
  then echo "$release ok"
  else echo "$release AFFECTED"
  fi
done

Now, you used to write the fix on trunk/master. This is problematic because the buggy part may have changed between versions, so a patch will not usually apply to an older version. In particular, code in master might depend on any features available in master, which might not have been present in older versions. It therefore makes a lot of sense that a Git commit references its whole history, not just the change set. When merging back, it will pull in all the history it depends on.

Using cherry-pick or rebase ignores this history, and records a new commit with the same changeset, but a different history. As pointed out, this will not work if your codebase has diverged.

The “correct” solution is to write the fix as a hotfix on the oldest affected release. Then, you merge oldest release into the second-oldest release. Usually, a newer release would contain all commits of an older release, so this is OK. If things have changed, you now have a chance to manually resolve the merge conflict. Then you continue merging each release into the next-younger release until you are done. This maintains proper history and avoids a lot of unnecessary work, while only requiring effort that has to be expended anyway. In particular, this incremental merging gets you closer to the current development state in small steps.

As a diagram:

| o normal commit |
| x hotfix        |
| ⇘ merging       |

3.8 --o-----------------x
       \                 ⇘
4.1     o--o--o-----------x'
               \           ⇘
4.2             o--o--o-----x''
                       \     ⇘
develop                 o--o--x'''--o--o

Best Answer

Related Solutions

Version Control – Creating a Version Control Strategy for SVN

Version Control – How to Merge Bug Fixes from Trunk in Old Branches

Related Topic