SVN – Using Subversion as an Artifact Repository vs Specific Artifact Management Tools

dependency-managementsvn

TL;DR: Why use something like Apache Archiva or Sonatype Nexus as an artifact repository instead of Subversion?

The build system I use currently has a lot of binary blobs (images, sound files, compiled binaries, etc), both as input and output to our builds. Our system for managing these is very ad hoc; some are checked into our Subversion repository alongside our code, some are stored elsewhere outside any formal version control.

I'm looking at consolidating this, so we have something that's more self-consistent and easy to use, and which separates binary artifacts from code.

Google tells me there are a selection of artifact repositories available (Archiva, Nexus, Artifactory, …), but from reading around, I can't see any advantage to using these over Subversion. That will look after the binaries for us – it already does that for some of our binaries, we'd just want to rearrange the repository layout to separate them from code – and has the notable advantage that we already have Subversion servers and expertise.

So. What's the advantage of using a dedicated artifact management system over using a general version control tool like Subversion?

Best Answer

Short answer: Generally, you don't need a history of binary artifacts and changes to those artifacts, you just need specific versions.

Longer answer: Every time you commit a small change to a binary file, version control systems don't have any way to create a delta -- a diff between the two files -- so it creates a whole new copy.

In a CVCS, like SVN, that's not such a big pain, because you only have one central copy of your repository -- your local copy is only one version. (Although, even then, your repository can become very large, making checkins slower.) But what happens if you later switch to a DVCS, where every copy of a repository has the full history of every file? The size of changes becomes very relevant there.

And what does it give you in return for the pain? The only thing it offers is being able to go back to a previous version of your repository and know that you have the correct binaries for that version.

But do you need the whole binary in your repository to do that? Or can you get away with simply having a text file, telling the build process which versions to pull from another repository elsewhere?

The latter is what is offered by artifact repositories generally.

In addition, some of the more professional ones, such as Nexus, will also give you information about licensing for third-party artifacts, so that you don't risk falling afoul of some subtle clause in what you believe to be a FOSS library.

Branch organisation

With Mercurial you can happily ignore the whole of your first organisational chart. As Spoke says, each repository has it's own set of tags, branches (named and anonymous) and can be organised according to business need.

If bespokeProjectTwo needs a special version of the charting library, then you would branch charting, add the new facilities and use it in bespokeProjectTwo. The new facilities (and their bugs) would not be used by other projects which would reference the standard charting library. If the main charting library had bugs fixed, you could merge those changes into the branch. If other projects also needed these facilities, you could either get those projects to use the special branch, or merge the branch up into the main-line and close the branch.

Also, there is nothing stopping you having a policy to structure branch names to provide specific facilities like your AUTOMATION branches.

Directory organisation

There is no reason why you can't keep your source directory exactly as it is with Mercurial. The only difference is that whereas with Subversion you have a single monolithic (src) repository, with Mercurial you are better off splitting into repositories which are logically grouped. From your source tree structure, I would probably extract out each of the following as individual repositories:

src-+
      +-(developmentAutomation)
      +-libraries-+
      |           +-(log)
      |           +-(statistics)
      |           +-(charting)
      |           +-(distributedComputing)
      |           +-(widgets)
      +-productLines-+
      |              +-(flagshipProduct)
      |              +-(coolNewProduct)
      +-project-+
                +-bigImportantCustomer-+
                |                      +-(bespokeProjectOne)
                |                      +-(bespokeProjectTwo)
                +-anotherImportantCustomer-+
                                           +-(anotherBespokeProject)

This allows any product or bespoke project to use any combination of libraries, at any revision. Have a look at mercurial sub-repositories for an easy way to manage which libraries are used for any given version of a product or project.

Workflow

An alternative to Spoike's suggested workflow (developer pulls from blessed repo, works locally, issues a pull request and finally the integrator pulls those changes & merges them) would be to use the continuous integration system as an intermediary.

As before, the developer pulls from blessed repo and works locally, but when done, they pull from the blessed repo again and merge themselves before pushing to an unblessed repo. Any changes to the unblessed repo are then reviewed (either manually or automatically) and moved to the blessed repo only if they are approved.

This means that the integrator only has accept or reject a change, not do the merge. ^{In my experience it is almost always better for the developer who wrote the code to perform the merge than for someone else to do it.}

As suggested in the mercurial book, hooks can be used to automate this procedure:

When someone pushes a changeset to the server that everyone pulls from, the server will test the changeset before it accepts it as permanent, and reject it if it fails to pass the test suite. If people only pull changes from this filtering server, it will serve to ensure that all changes that people pull have been automatically vetted.

Other issues

The problem of large test datasets can also be solved by putting that test data into a mercurial sub-repository. This will prevent the code repository getting bloated with test data, while still keeping the test data under revision control.

Git – How to Use Subversion Repository Inside Git Repository

I can't be the only one to think of the Xzibit nested items meme, right? Anyway...

One of the remaining cool things that Subversion does is called "externals." It's a way to point at a specific branch or directory in another svn repository. You can even pin it down to a specific version of a specific directory. Externals are really darn nifty, and would solve this problem in an instant, as changes made in an externals directory are automatically pushed back to the source when doing a commit.

Externals is also something missing in git. Git has submodules, but they don't work in the same way, in that they're tied to a specific commit. This effectively means that there's no native solution to the problem of having "nested" repositories that can be read and written to at the same time and remain perfectly in sync, no less nested repositories using different backends.

If you don't want to do the submodule revision pinning dance, there's another workaround.

Git has decent svn emulation in the git-svn tool. You're probably already using it. The SO question "How do I keep an svn:external up to date using git-svn?" offers us a useful option by abusing that tool.

The accepted answer was simply using git-svn to check out the Subversion repository outside of the tree controlled by git, simply using a symlink to point to it inside the tree. There's a bit more manual work involved in this one, as you would need to remember to commit that specific repository every time you make a change in it. However, it's simple, it's straight-forward, and it is known to work.

Another option entirely would be looking at Mercurial's subrepositories, which can host both git and svn. I'm not sure if you really want to go three levels deep.

Best Answer

Related Solutions

Mercurial Repository – Structure for Corporate Comms and Configuration Management

Branch organisation

Directory organisation

Workflow

Other issues

Git – How to Use Subversion Repository Inside Git Repository

Related Topic