Mercurial Repository – Structure for Corporate Comms and Configuration Management

configuration-managementcontinuous integrationmercurialorganizationsvn

I am yet another Subversion user struggling to re-educate myself in the Tao of distributed version control.

When using Subversion, I was a big fan of the project-minor approach, and, with most of my former employers, we would structure our repository branches; tags & trunk as follows:

branches-+
         +-personal-+
         |          +-alice-+
         |          |       +-shinyNewFeature
         |          |       +-AUTOMATED-+
         |          |                   +-shinyNewFeature
         |          +-bob-+
         |                +-AUTOMATED-+
         |                            +-bespokeCustomerProject
         +-project-+
                   +-shinyNewFeature
                   +-fixStinkyBug
tags-+
     +-m20110401_releaseCandidate_0_1
     +-m20110505_release_0_1
     +-m20110602_milestone
trunk

Within the actual source tree itself, we would use (something like) the following structure:

  (src)-+
        +-developmentAutomation-+
        |                       +-testAutomation
        |                       +-deploymentAutomation
        |                       +-docGeneration
        |                       +-staticAnalysis
        |                       +-systemTest
        |                       +-performanceMeasurement
        |                       +-configurationManagement
        |                       +-utilities
        +-libraries-+
        |           +-log-+
        |           |     +-build
        |           |     +-doc
        |           |     +-test
        |           +-statistics-+
        |           |            +-build
        |           |            +-doc
        |           |            +-test
        |           +-charting-+
        |           |          +-build
        |           |          +-doc
        |           |          +-test
        |           +-distributedComputing-+
        |           |                      +-build
        |           |                      +-doc
        |           |                      +-test
        |           +-widgets-+
        |                     +-build
        |                     +-doc
        |                     +-test
        +-productLines-+
        |              +-flagshipProduct-+
        |              |                 +-coolFeature
        |              |                 +-anotherCoolFeature
        |              |                 +-build
        |              |                 +-doc
        |              |                 +-test
        |              +-coolNewProduct
        +-project-+
                  +-bigImportantCustomer-+
                  |                      +-bespokeProjectOne
                  |                      +-bespokeProjectTwo
                  +-anotherImportantCustomer-+
                                             +-anotherBespokeProject

The idea was (and still is) to use the structure of the repository to help structure communication between the engineering team; the customer-facing part of the business and various other stakeholders & domain experts.

To wit: Source documents that sit in one of the "project" directories get used (and earn money) only once. Documents that sit in one of the "productLines" directories earn money as many times as a product from that particular line gets sold. Documents that sit in one of the "libraries" directories earn money as many times as any of the products that use them get sold.

It makes the notion of amortization of costs explicit, and helps build support for source document reuse across the business.

It also means that there is a common structure over which our build automation tools can operate. (Our build scripts walk the source tree looking for "build" folders within which they find configuration files specifying how each component is to be built; a similar process happens for documentation generation and testing).

Significantly, the products on which I work typically take a LONG time to run performance measurement & characterization tests; from 20 to 200 hours; generating somewhere between several GB to several TB of processed test results/intermediate data (that must be stored and tied to a particular system configuration so performance improvement over time can be measured). This issue makes configuration management an important consideration, and also imposes some requirement for centralisation, as typically the computational resources needed to run the performance measurement and characterization tests are limited; (a small cluster of 64-128 cores).

As one final note; the continuous integration system knows that it needs to trigger a build; static analysis; smoke test & unit test run each time trunk is modified, each time any "tag" branch is modified, and each time any "AUTOMATED" branch branch is modified. This way, individual developers can use the CI system with their personal branches, an important capability, IMHO.

Now, here is my question: How can I replicate all of the above (and improve upon it, if possible), with Mercurial.

–edit:

My current line of thinking is to use a central Subversion Repository, to define the overall structure, but to allow the use of hg as a client so developers can have repos available locally.

Best Answer

Spoike's answer is excellent, but there are a few things I think it would be worth adding which are too large for comments.

Branch organisation

With Mercurial you can happily ignore the whole of your first organisational chart. As Spoke says, each repository has it's own set of tags, branches (named and anonymous) and can be organised according to business need.

If bespokeProjectTwo needs a special version of the charting library, then you would branch charting, add the new facilities and use it in bespokeProjectTwo. The new facilities (and their bugs) would not be used by other projects which would reference the standard charting library. If the main charting library had bugs fixed, you could merge those changes into the branch. If other projects also needed these facilities, you could either get those projects to use the special branch, or merge the branch up into the main-line and close the branch.

Also, there is nothing stopping you having a policy to structure branch names to provide specific facilities like your AUTOMATION branches.

Directory organisation

There is no reason why you can't keep your source directory exactly as it is with Mercurial. The only difference is that whereas with Subversion you have a single monolithic (src) repository, with Mercurial you are better off splitting into repositories which are logically grouped. From your source tree structure, I would probably extract out each of the following as individual repositories:

src-+
      +-(developmentAutomation)
      +-libraries-+
      |           +-(log)
      |           +-(statistics)
      |           +-(charting)
      |           +-(distributedComputing)
      |           +-(widgets)
      +-productLines-+
      |              +-(flagshipProduct)
      |              +-(coolNewProduct)
      +-project-+
                +-bigImportantCustomer-+
                |                      +-(bespokeProjectOne)
                |                      +-(bespokeProjectTwo)
                +-anotherImportantCustomer-+
                                           +-(anotherBespokeProject)

This allows any product or bespoke project to use any combination of libraries, at any revision. Have a look at mercurial sub-repositories for an easy way to manage which libraries are used for any given version of a product or project.

Workflow

An alternative to Spoike's suggested workflow (developer pulls from blessed repo, works locally, issues a pull request and finally the integrator pulls those changes & merges them) would be to use the continuous integration system as an intermediary.

As before, the developer pulls from blessed repo and works locally, but when done, they pull from the blessed repo again and merge themselves before pushing to an unblessed repo. Any changes to the unblessed repo are then reviewed (either manually or automatically) and moved to the blessed repo only if they are approved.

This means that the integrator only has accept or reject a change, not do the merge. In my experience it is almost always better for the developer who wrote the code to perform the merge than for someone else to do it.

As suggested in the mercurial book, hooks can be used to automate this procedure:

When someone pushes a changeset to the server that everyone pulls from, the server will test the changeset before it accepts it as permanent, and reject it if it fails to pass the test suite. If people only pull changes from this filtering server, it will serve to ensure that all changes that people pull have been automatically vetted.

Other issues

The problem of large test datasets can also be solved by putting that test data into a mercurial sub-repository. This will prevent the code repository getting bloated with test data, while still keeping the test data under revision control.

Related Topic