Git submodule vs Git clone

gitgithub

I am working on an open source project on GitHub.

It has a subdirectory /Vendor in which it has a copy of several external libraries. Original maintainer of the project updated this directory with newer copy of external library once in a while.

One developer send me a pull request with idea to replace this copy by git submodule.

And I am considering whether it's good idea or not.

Git submodule Pros:

  • Submodules were specifically designed for similar scenarios
  • It removes possibility of accidental commit to Vendor which will be overwritten while next update

Git submodule Cons:

  • It looks like git submodules pushes complexity from maintainer to a person who will clone/pull the project (additional steps required after you clone to start working with the project: "git submodule init", "git submodule update"

What's your opinion on this?

One more thing. This problem is reasonably small size library with very limited external dependencies. I think any build tool would be overkill for it for now.

Best Answer

An alternative to a submodule is to use git subtree. This gives the benefits of git submodule but without pushing the complexity to the end user. The third party repository is merged into the main project tree, but there is metadata stored in such a way that you can:

  • extract the third party repository later, if any interesting changes have been made
  • merge in new updates from the third party repository (note merge, not overwrite)

For Git users who are not sophisticated enough to understand submodules, the subtree approach makes getting a clone of your project no more difficult than any other clone. A short blurb from the documentation:

Subtrees allow subprojects to be included within a subdirectory of the main project, optionally including the subproject's entire history.

For example, you could include the source code for a library as a subdirectory of your application.

Subtrees are not to be confused with submodules, which are meant for the same task. Unlike submodules, subtrees do not need any special constructions (like .gitmodule files or gitlinks) be present in your repository, and do not force end-users of your repository to do anything special or to understand how subtrees work. A subtree is just a subdirectory that can be committed to, branched, and merged along with your project in any way you want.

I had set up a project at work using submodules, and the troubles with keeping submodules up to date in everybody's clones was too much work. I recently changed to using subtrees everywhere and those problems disappeared.

Note that git-subtree is part of the git/contrib directory, and must be installed separately.

Related Topic