Git – Steps to Convert Multi-Repo to Mono-Repo

git

What are the best steps to convert multi-repos to a mono-repo?

This is what I have so far:

  1. for each repo, check out the most recent branch (integration branch, usually)
  2. for each repo, copy the repo folder to the new git repo (the mono repo)
  3. for each folder, delete the old .git folder
  4. Stage all the files, commit, and push the new mono-repo

My only question at the moment – will the existing .gitignore files work properly for the subfolders in the new mono-repo?

Best Answer

Don't copy the files, merge the repositories instead. Git doesn't make a big difference between “different repository” and “different branch”. More precisely, a repository is a collection of tags and branches. I'll assume that you want to merge the master branch of all repos.

General approach (but see discussion of git-subtree below):

  1. Think about the layout of your monorepo. I'll assume that for the start, you'll have each current repository as a sub-folder of the monorepo in order to avoid conflicts.

  2. For each current repository, move the repository contents into a subfolder and commit the change. You can use the git mv command to do this easily.

    E.g. if your component is called libfoo and you currently have this repository layout:

    Makefile
    README.txt
    src/
      ...
    include/
      ...
    

    Then we might move it into a libfoo/ folder:

    libfoo/
      Makefile
      README.txt
      src/
        ...
      include/
        ...
    
  3. Create a new repository for your monorepo, and add all existing repos as a “remote”. Despite its name, a remote repository can be a path to some directory on the same file system. Then git fetch --all remotes to load their history into the monorepo's git database. Afterwards, you can list all branches with git branch --all. This will look like:

    * master
      remotes/libfoo/master
      remotes/libbar/master
      ...
    
  4. For each remote, merge it's master branch. There should be no conflicts because everything is in a separate directory.

  5. Now you're done, and you have a monorepo without loss of history. You can remove the remotes.

But careful: you can only merge one branch of each repo. If one of the original repositories has multiple branches, they can no longer be merged without excessive conflicts. Consider rebasing them after you move the repository contents into one folder, but before merging everything into the monorepo.

In practice, you can use git subtree to automate most of these steps. The subtree command allows you to merge a specific branch into a specific directory.

  1. initialize the monorepo and make at least one commit
  2. for each existing repo, add it as a subtree, e.g.:

    git subtree add -P libfoo/ ../path/to/libfoorepo master
    

    The -P/--prefix is the directory under which the repository contents should be added. In place of the path to a repo, any repository URL can be used. By default this will add the complete history, alternatively you can --squash the history into a single commit.

Git-subtree is an extremely powerful tool for manipulating monorepos. You can also extract a directory into a separate repository (git subtree push) or merge updates from the original repo (git subtree pull). For example, you might use this to translate different branches.

  • To translate a feature branch:
  • Create a new branch in the monorepo: git checkout -b feature
  • Pull the changes from libfoo's feature into the correct directory of the monorepo: git subtree pull -P libfoo/ ../path/to/libfoorepo feature.
  • Optional: rebase the branch onto master to simplify the history graph.

But consider whether a monorepo is really appropriate for your use case. It may still be desirable to have different repos available independently. The main contender is git submodules, where one repository is mounted as a sub-directory of another. However, the experience is not seamless. The branch history is not shared with the submodule. If you edit code in a submodule you have to commit that work separately. Git submodules are most useful for “vendoring“ external dependencies that are pinned to a specific version, not for combined development.

Whatever approach you use, gitignore files will continue to work because any patterns are matched relative to the gitignore file. A repository can contain multiple gitignore files.

Related Topic