Mercurial Repository – Structure for Corporate Comms and Configuration Management

configuration-managementcontinuous integrationmercurialorganizationsvn

I am yet another Subversion user struggling to re-educate myself in the Tao of distributed version control.

When using Subversion, I was a big fan of the project-minor approach, and, with most of my former employers, we would structure our repository branches; tags & trunk as follows:

branches-+
         +-personal-+
         |          +-alice-+
         |          |       +-shinyNewFeature
         |          |       +-AUTOMATED-+
         |          |                   +-shinyNewFeature
         |          +-bob-+
         |                +-AUTOMATED-+
         |                            +-bespokeCustomerProject
         +-project-+
                   +-shinyNewFeature
                   +-fixStinkyBug
tags-+
     +-m20110401_releaseCandidate_0_1
     +-m20110505_release_0_1
     +-m20110602_milestone
trunk

Within the actual source tree itself, we would use (something like) the following structure:

  (src)-+
        +-developmentAutomation-+
        |                       +-testAutomation
        |                       +-deploymentAutomation
        |                       +-docGeneration
        |                       +-staticAnalysis
        |                       +-systemTest
        |                       +-performanceMeasurement
        |                       +-configurationManagement
        |                       +-utilities
        +-libraries-+
        |           +-log-+
        |           |     +-build
        |           |     +-doc
        |           |     +-test
        |           +-statistics-+
        |           |            +-build
        |           |            +-doc
        |           |            +-test
        |           +-charting-+
        |           |          +-build
        |           |          +-doc
        |           |          +-test
        |           +-distributedComputing-+
        |           |                      +-build
        |           |                      +-doc
        |           |                      +-test
        |           +-widgets-+
        |                     +-build
        |                     +-doc
        |                     +-test
        +-productLines-+
        |              +-flagshipProduct-+
        |              |                 +-coolFeature
        |              |                 +-anotherCoolFeature
        |              |                 +-build
        |              |                 +-doc
        |              |                 +-test
        |              +-coolNewProduct
        +-project-+
                  +-bigImportantCustomer-+
                  |                      +-bespokeProjectOne
                  |                      +-bespokeProjectTwo
                  +-anotherImportantCustomer-+
                                             +-anotherBespokeProject

The idea was (and still is) to use the structure of the repository to help structure communication between the engineering team; the customer-facing part of the business and various other stakeholders & domain experts.

To wit: Source documents that sit in one of the "project" directories get used (and earn money) only once. Documents that sit in one of the "productLines" directories earn money as many times as a product from that particular line gets sold. Documents that sit in one of the "libraries" directories earn money as many times as any of the products that use them get sold.

It makes the notion of amortization of costs explicit, and helps build support for source document reuse across the business.

It also means that there is a common structure over which our build automation tools can operate. (Our build scripts walk the source tree looking for "build" folders within which they find configuration files specifying how each component is to be built; a similar process happens for documentation generation and testing).

Significantly, the products on which I work typically take a LONG time to run performance measurement & characterization tests; from 20 to 200 hours; generating somewhere between several GB to several TB of processed test results/intermediate data (that must be stored and tied to a particular system configuration so performance improvement over time can be measured). This issue makes configuration management an important consideration, and also imposes some requirement for centralisation, as typically the computational resources needed to run the performance measurement and characterization tests are limited; (a small cluster of 64-128 cores).

As one final note; the continuous integration system knows that it needs to trigger a build; static analysis; smoke test & unit test run each time trunk is modified, each time any "tag" branch is modified, and each time any "AUTOMATED" branch branch is modified. This way, individual developers can use the CI system with their personal branches, an important capability, IMHO.

Now, here is my question: How can I replicate all of the above (and improve upon it, if possible), with Mercurial.

–edit:

My current line of thinking is to use a central Subversion Repository, to define the overall structure, but to allow the use of hg as a client so developers can have repos available locally.

Best Answer

Spoike's answer is excellent, but there are a few things I think it would be worth adding which are too large for comments.

Branch organisation

With Mercurial you can happily ignore the whole of your first organisational chart. As Spoke says, each repository has it's own set of tags, branches (named and anonymous) and can be organised according to business need.

If bespokeProjectTwo needs a special version of the charting library, then you would branch charting, add the new facilities and use it in bespokeProjectTwo. The new facilities (and their bugs) would not be used by other projects which would reference the standard charting library. If the main charting library had bugs fixed, you could merge those changes into the branch. If other projects also needed these facilities, you could either get those projects to use the special branch, or merge the branch up into the main-line and close the branch.

Also, there is nothing stopping you having a policy to structure branch names to provide specific facilities like your AUTOMATION branches.

Directory organisation

There is no reason why you can't keep your source directory exactly as it is with Mercurial. The only difference is that whereas with Subversion you have a single monolithic (src) repository, with Mercurial you are better off splitting into repositories which are logically grouped. From your source tree structure, I would probably extract out each of the following as individual repositories:

src-+
      +-(developmentAutomation)
      +-libraries-+
      |           +-(log)
      |           +-(statistics)
      |           +-(charting)
      |           +-(distributedComputing)
      |           +-(widgets)
      +-productLines-+
      |              +-(flagshipProduct)
      |              +-(coolNewProduct)
      +-project-+
                +-bigImportantCustomer-+
                |                      +-(bespokeProjectOne)
                |                      +-(bespokeProjectTwo)
                +-anotherImportantCustomer-+
                                           +-(anotherBespokeProject)

This allows any product or bespoke project to use any combination of libraries, at any revision. Have a look at mercurial sub-repositories for an easy way to manage which libraries are used for any given version of a product or project.

Workflow

An alternative to Spoike's suggested workflow (developer pulls from blessed repo, works locally, issues a pull request and finally the integrator pulls those changes & merges them) would be to use the continuous integration system as an intermediary.

As before, the developer pulls from blessed repo and works locally, but when done, they pull from the blessed repo again and merge themselves before pushing to an unblessed repo. Any changes to the unblessed repo are then reviewed (either manually or automatically) and moved to the blessed repo only if they are approved.

This means that the integrator only has accept or reject a change, not do the merge. ^{In my experience it is almost always better for the developer who wrote the code to perform the merge than for someone else to do it.}

As suggested in the mercurial book, hooks can be used to automate this procedure:

When someone pushes a changeset to the server that everyone pulls from, the server will test the changeset before it accepts it as permanent, and reject it if it fails to pass the test suite. If people only pull changes from this filtering server, it will serve to ensure that all changes that people pull have been automatically vetted.

Other issues

The problem of large test datasets can also be solved by putting that test data into a mercurial sub-repository. This will prevent the code repository getting bloated with test data, while still keeping the test data under revision control.

Related Solutions

Best practice with branching source code and application lifecycle

I think your approach to branching and merging is OK, but if the main problem is that the code base is quite unstable, that's what you need to focus on and minimise.

The primary thing to ensure is that the code base has good separation of concerns. Dependencies between various components need to be isolated and reduced. This should solve the majority of your problems. Also following practices such as single responsibility principle will help.

If a major architectural change needs to occur, it should take place in its own branch, and then merged back into main once fully tested and 'stable' (within reason). This may be painful and challenging but it also should be rare. If you have good testing practices in place then risk is minimised.

It may also help to change to a distributed version control system. This should give you a trunk that is stable, with different features merged in from different branches when they are ready. You will still have pain merging if the code is too interdependent, but you will have more control.

Looking at this from another perspective, also consider increased communication amongst your team. Run regular agile-style standup meetings. Consider where team members sit and how that can help. If a complex merge needs to take place, it may not be such a bad thing - use a pair programming approach that will give understanding to both parties.

Version Control – Creating a Version Control Strategy for SVN

If you want a unified build process, then be sure to put branches/tags/trunk at the root, like this:

branches/
tags/
trunk/
  dev/
    ...

If you don't need a unified build process, then you can put branches/tags/trunks within each project if you want. However, it might be difficult to migrate to a unified build after having put them within each project. A unified build has advantages, such as eliminating the need to publish shared components among projects -- they're all part of the build.

Personally, I like a unified build process. Furthermore, I don't think you should have a "dev" project. You should have projects directly under trunk, and then branch trunk into a dev branch. Use tags for releases. For example, I would do it like this:

branches/
  dev/
    Site1/
    Site2/
    WebService/
    SharedCode/
tags/
  release1/
    Site1/
    Site2/
    WebService/
    SharedCode/
trunk/
  Site1/
  Site2/
  WebService/
  SharedCode/