R – Handling multiple changesets in source control systems

perforceversion control

I have a fairly infrequent problem occuring with source control. In the example here the problem was occuring with Perforce, but I suspect the same problem will occur with many SCMs, especially distributed SCMs.

Perforce supports changelists (or changesets if you prefer). Changelists support two common usages:

  1. When you commit a changelist, the commit is atomic so that all the files are committed or none are. This is the headline feature that most people talk about when referring to changelists.

  2. Perforce supports multiple changelists. Basically, when you check out a file you tell it which changelist it belongs to. So, if you are working on the fancy new email feature which is going to take months of work and makes millions of dollars and somebody from tech support comes to you with a bug that must be fixed yesterday, you don't have to start with a new branch of the whole project. You can just check out the buggy file into a new changelist, fix the problem, check in the new changelist and get back to the real work of the new email feature, as though nothing had happened.

For the most part everything works great. However, when you are implemening the email feature you are making zillions of changes all over the place, especially in main.h, and it just so happens that when go to work on the bug fix you discover that the tiny change you have to make is also in main.h. The changelist for the new feature already has main.h checked out, so you can't easily put it in the changelist for the bug fix.

Now what do you do? You have several choices:

  1. Create a new clientspec. A clientspec in Perforce is a list of files/directories in the depot and a local destination where everything is to be copied. So you can create a second copy of the project without any of changes for the email feature.

  2. Do a fudge. Backup your modified copy of main.h and revert this file. You are then free to checkout main.h into the bugfix changelist. You fix the bug, check in the bugfix changelist, then checkout main.h into the email feature changelist. Finally you merge all your changes from the backup you made at the start.

  3. You determine that all the changes you have made to main.h have no side affects or dependencies, so you just move main.h into the bugfix changelist, make the change and check it in. You then check it out again into the email feature changelist. Obviously there are two problems with this approach: firstly there may in fact be side affects that you hadn't considered and secondly you have corrupted your version histoty.

Option 1 is probably the cleanest, but not always practical. A project I was working on had millions of lines of code and a really complicated build process. It would take a day to setup a new environment, so it was not really practical for a 5 minute bug fix.

Option 3 is a bad option, but is is the quickest, so it can be very seductive.

That leaves Option 2, which is the one I would generally use.

Does anybody have a better solution?

My apologies for the lengthy question, but I have discovered on StackOverflow that fully thought out questions elicit better answers.

Best Answer

This exact problem has been called the "Tangled Working Copy Problem". Ryan Tomayko has a blog entry titled The Thing About Git that talks about this problem in detail and how Git addresses it.

This is one of the best things about Git. I use git add -p at least daily, to help commit individual chunks of code that make sense independently of one another. The fact that two logically different changes are in the same source file has become irrelevant.