Git – Organizing Git repositories with common nested sub-modules

cmakegitsubmodules

I'm a big fan of Git sub-modules. I like to be able to track a dependency along with its version, so that you can roll-back to a previous version of your project and have the corresponding version of the dependency to build safely and cleanly. Moreover, it's easier to release our libraries as open source projects as the history for libraries is separate from that of the applications that depend on them (and which are not going to be open sourced).

I'm setting up workflow for multiple projects at work, and I was wondering how it would be if we took this approach a bit of an extreme instead of having a single monolithic project. I quickly realized there is a potential can of worms in really using sub-modules.

Supposing a pair of applications: studio and player, and dependent libraries core, graph and network, where dependencies are as follows:

  • core is standalone
  • graph depends on core (sub-module at ./libs/core)
  • network depdends on core (sub-module at ./libs/core)
  • studio depends on graph and network (sub-modules at ./libs/graph and ./libs/network)
  • player depends on graph and network (sub-modules at ./libs/graph and ./libs/network)

Suppose that we're using CMake and that each of these projects has unit tests and all the works. Each project (including studio and player) must be able to be compiled standalone to perform code metrics, unit testing, etc.

The thing is, a recursive git submodule fetch, then you get the following directory structure:

studio/
studio/libs/                    (sub-module depth: 1)
studio/libs/graph/
studio/libs/graph/libs/         (sub-module depth: 2)
studio/libs/graph/libs/core/
studio/libs/network/
studio/libs/network/libs/       (sub-module depth: 2)
studio/libs/network/libs/core/

Notice that core is cloned twice in the studio project. Aside from this wasting disk space, I have a build system problem because I'm building core twice and I potentially get two different versions of core.

Question

How do I organize sub-modules so that I get the versioned dependency and standalone build without getting multiple copies of common nested sub-modules?

Possible solution

If the the library dependency is somewhat of a suggestion (i.e. in a "known to work with version X" or "only version X is officially supported" fashion) and potential dependent applications or libraries are responsible for building with whatever version they like, then I could imagine the following scenario:

  • Have the build system for graph and network tell them where to find core (e.g. via a compiler include path). Define two build targets, "standalone" and "dependency", where "standalone" is based on "dependency" and adds the include path to point to the local core sub-module.
  • Introduce an extra dependency: studio on core. Then, studio builds core, sets the include path to its own copy of the core sub-module, then builds graph and network in "dependency" mode.

The resulting folder structure looks like:

studio/
studio/libs/                    (sub-module depth: 1)
studio/libs/core/
studio/libs/graph/
studio/libs/graph/libs/         (empty folder, sub-modules not fetched)
studio/libs/network/
studio/libs/network/libs/       (empty folder, sub-modules not fetched)

However, this requires some build system magic (I'm pretty confident this can be done with CMake) and a bit of manual work on the part of version updates (updating graph might also require updating core and network to get a compatible version of core in all projects).

Any thoughts on this?

Best Answer

I'm very late to this party, but your question still doesn't seem to have a complete answer, and it's a pretty prominent hit from google.

I have the exact same problem with C++/CMake/Git/Submodules and I have a similar problem with MATLAB/Git/Submodules, which gets some extra weirdness because MATLAB isn't compiled. I came across this video recently, which seems to propose a "solution". I don't like the solution, because it essentially means throwing away submodules, but it does eliminate the problem. It is just as @errordeveloper recommends. Each project has no submodules. To build a project, create a super-project to build it, and include it as a sibling to its dependencies.

So your project for developing graph might look like:

buildgraph/graph
buildgraph/core

and then your project for studio could be:

buildstudio/studio
buildstudio/graph
buildstudio/network
buildstudio/core

The super-projects are just a main CMakeLists.txt and a bunch of submodules. But none of the projects have any submodules themselves.

The only cost I see to this approach is the proliferation of trivial "super-projects" that are just dedicated to building your real projects. And if someone gets a hold of one of your projects, there is no easy way to tell without finding the super-project as well, what its dependencies are. That might make it sit really ugly on Github, for example.

Related Topic