Linux – Update Git super repository automatically when a submodule gets updated

gitgitolitehooklinux

in our company we have a huge code base (>100000 files) and so we keep it in several git repositories. So we have a forest of repositories and one super repository containing only submodule references on top of that.

The idea is to have the super repository just as a convenience glue and update it automatically whenever a developer updates any submodule.

I have experimented with the post-receive hook and ended up with the following implementation:
(it involves git plumbing in order to be able to modify the bare repository directly)

#!/bin/bash -e

UPDATED_BRANCHES="^(master|develop)$"
UPDATED_REPOS="^submodules/.+$"

# determine what branch gets modified
read REV_OLD REV_NEW FULL_REF
BRANCH=${FULL_REF##refs/heads/}

if [[ "${BRANCH}" =~ ${UPDATED_BRANCHES} ]] && [[ "${GL_REPO}" =~ ${UPDATED_REPOS} ]];
then
    # determine the name of the branch in the super repository
    SUPERBRANCH=$FULL_REF
    SUBMODULE_NAME=${GL_REPO##submodules/}
    # clean the submodule repo related environment
    unset $(git rev-parse --local-env-vars)
    # move to the super repository
    cd $SUPERREPO_DIR

    echo "Automaticaly updating the '$SUBMODULE_NAME' reference in the super   repository..."
    # modify the index - replace the submodule reference hash
    git ls-tree $SUPERBRANCH | \
        sed "s/\([1-8]*\) commit \([0-9a-f]*\)\t$SUBMODULE_NAME/\1 commit $REV_NEW\t$SUBMODULE_NAME/g" | \
        git update-index --index-info

    # write the tree containing the modified index
    TREE_NEW=$(git write-tree)
    COMMIT_OLD=$(git show-ref --hash $SUPERBRANCH)

    # write the tree to a new commit and use the current commit as its parent
    COMMIT_NEW=$(echo "Auto-update submodule: $SUBMODULE_NAME" | git commit-tree $TREE_NEW -p $COMMIT_OLD)

    # update the branch reference
    git update-ref $SUPERBRANCH $COMMIT_NEW
    # shall we also update the HEAD?
    # git symbolic-ref HEAD $SUPERBRANCH
fi

Now the questions are:

  • Is it a good idea at all to use a git hook to modify another repository than the one that triggered the event?
  • Is the hook implementation OK?
    (It seems to be working on my machine, but I have no prior experience with git plumbing and so maybe I have omitted something)
  • I guess there is a possibility of race conditions in case of two (or more) submodules being updated simultaneously. Is it possible to prevent that somehow (e.g. a lock file)?
    (we are using gitolite as the access layer).
  • Would it be better to use a clone of the super repository for the modification and then push (as opposed to modify the bare super repository directly)?

Thanks in advance.

Best Answer

There are benefits to the implementation you've done. Although you have omitted some possible edge-cases like checking for un-staged changes in other branches (you might want to add/stash first). The alternative to this is using a continuous integration system like Jenkins to handle the updates:

https://wiki.jenkins-ci.org/display/JENKINS/Meet+Jenkins

This has several benefits over the git hooks system. It can be centrally controlled (we ran into issues getting the git-hooks to work on different operating systems our engineers used the more complexity we added). There is more functionality available as well (lots of user contributed modules). Our repo scripts now contact Jenkins for repo status and can update accordingly.