Spoike's answer is excellent, but there are a few things I think it would be worth adding which are too large for comments.
Branch organisation
With Mercurial you can happily ignore the whole of your first organisational chart. As Spoke says, each repository has it's own set of tags, branches (named and anonymous) and can be organised according to business need.
If bespokeProjectTwo
needs a special version of the charting
library, then you would branch charting
, add the new facilities and use it in bespokeProjectTwo
. The new facilities (and their bugs) would not be used by other projects which would reference the standard charting
library. If the main charting
library had bugs fixed, you could merge those changes into the branch. If other projects also needed these facilities, you could either get those projects to use the special branch, or merge the branch up into the main-line and close the branch.
Also, there is nothing stopping you having a policy to structure branch names to provide specific facilities like your AUTOMATION branches.
Directory organisation
There is no reason why you can't keep your source directory exactly as it is with Mercurial. The only difference is that whereas with Subversion you have a single monolithic (src)
repository, with Mercurial you are better off splitting into repositories which are logically grouped. From your source tree structure, I would probably extract out each of the following as individual repositories:
src-+
+-(developmentAutomation)
+-libraries-+
| +-(log)
| +-(statistics)
| +-(charting)
| +-(distributedComputing)
| +-(widgets)
+-productLines-+
| +-(flagshipProduct)
| +-(coolNewProduct)
+-project-+
+-bigImportantCustomer-+
| +-(bespokeProjectOne)
| +-(bespokeProjectTwo)
+-anotherImportantCustomer-+
+-(anotherBespokeProject)
This allows any product or bespoke project to use any combination of libraries, at any revision. Have a look at mercurial sub-repositories for an easy way to manage which libraries are used for any given version of a product or project.
Workflow
An alternative to Spoike's suggested workflow (developer pulls from blessed repo, works locally, issues a pull request and finally the integrator pulls those changes & merges them) would be to use the continuous integration system as an intermediary.
As before, the developer pulls from blessed repo and works locally, but when done, they pull from the blessed repo again and merge themselves before pushing to an unblessed repo. Any changes to the unblessed repo are then reviewed (either manually or automatically) and moved to the blessed repo only if they are approved.
This means that the integrator only has accept or reject a change, not do the merge. In my experience it is almost always better for the developer who wrote the code to perform the merge than for someone else to do it.
As suggested in the mercurial book, hooks can be used to automate this procedure:
When someone pushes a changeset to the server that everyone pulls from, the server will test the changeset before it accepts it as permanent, and reject it if it fails to pass the test suite. If people only pull changes from this filtering server, it will serve to ensure that all changes that people pull have been automatically vetted.
Other issues
The problem of large test datasets can also be solved by putting that test data into a mercurial sub-repository. This will prevent the code repository getting bloated with test data, while still keeping the test data under revision control.
I can't be the only one to think of the Xzibit nested items meme, right? Anyway...
One of the remaining cool things that Subversion does is called "externals." It's a way to point at a specific branch or directory in another svn repository. You can even pin it down to a specific version of a specific directory. Externals are really darn nifty, and would solve this problem in an instant, as changes made in an externals directory are automatically pushed back to the source when doing a commit.
Externals is also something missing in git. Git has submodules, but they don't work in the same way, in that they're tied to a specific commit. This effectively means that there's no native solution to the problem of having "nested" repositories that can be read and written to at the same time and remain perfectly in sync, no less nested repositories using different backends.
If you don't want to do the submodule revision pinning dance, there's another workaround.
Git has decent svn emulation in the git-svn
tool. You're probably already using it. The SO question "How do I keep an svn:external up to date using git-svn?" offers us a useful option by abusing that tool.
The accepted answer was simply using git-svn
to check out the Subversion repository outside of the tree controlled by git, simply using a symlink to point to it inside the tree. There's a bit more manual work involved in this one, as you would need to remember to commit that specific repository every time you make a change in it. However, it's simple, it's straight-forward, and it is known to work.
Another option entirely would be looking at Mercurial's subrepositories, which can host both git and svn. I'm not sure if you really want to go three levels deep.
Best Answer
Short answer: Generally, you don't need a history of binary artifacts and changes to those artifacts, you just need specific versions.
Longer answer: Every time you commit a small change to a binary file, version control systems don't have any way to create a delta -- a diff between the two files -- so it creates a whole new copy.
In a CVCS, like SVN, that's not such a big pain, because you only have one central copy of your repository -- your local copy is only one version. (Although, even then, your repository can become very large, making checkins slower.) But what happens if you later switch to a DVCS, where every copy of a repository has the full history of every file? The size of changes becomes very relevant there.
And what does it give you in return for the pain? The only thing it offers is being able to go back to a previous version of your repository and know that you have the correct binaries for that version.
But do you need the whole binary in your repository to do that? Or can you get away with simply having a text file, telling the build process which versions to pull from another repository elsewhere?
The latter is what is offered by artifact repositories generally.
In addition, some of the more professional ones, such as Nexus, will also give you information about licensing for third-party artifacts, so that you don't risk falling afoul of some subtle clause in what you believe to be a FOSS library.