Git Commit Structure – How to Handle Commits When Refactoring for Unit Tests

Architecturedesigndesign-patternsgitversion control

I'm trying to get a review for my lists of pros/cons about how to structure commits that came out of a discussion at my work.

Here's the scenario:

  • I have to add feature X to a legacy code base
  • The current code base has something I can't mock making unit testing feature X impossible
  • I can refactor to make unit testing possible, but it results in a very large code change touching many other non-test classes that have little in common with feature X

My company has the following strictly enforced rules:

  • Each and every commit must be stand alone (compiles, passes test, etc.) We have automation that makes it impossible to merge until these have proven to pass.
  • Only fast-forward merges are allowed (no branches, no merge commits, our origin repository only has a a single master branch and it is a perfectly straight line)

So the question is how to structure the commits for these 3 things. (refactoring, feature X, and test for feature X) My colleague referred me to this other article but it doesn't seem to tackle the refactoring part. (I agree without the refactoring source and test should be in one commit)
The article talks about "breaking git bisect" and "making sure every commit compiles/passes" but our strict rules already cover that.
The main other argument they give is "logically related code kept together" which seems a bit to philosophical for me.

I see 3 ways to proceed. I'm hoping that you can either a) add to it b) comment on why one of the existing pro/cons is not important and should be removed from the list.

method 1 (one commit): includes feature X, test for feature X, and refactoring

pros:

  • "Logically related code kept together" (Not sure this is actually a "reason". I would probably argue all 3 methods do this, but some may argue otherwise. However, no one can argue against it here).
  • If you cherry-pick / revert without merge conflict, it will probably always compile & pass tests
  • There is never code not covered by test

cons:

  • Harder to code review. (Why is all this refactoring is done here despite not being related to feature X?)
  • You cannot cherry-pick without the refactoring. (You have to bring along the refactoring, increasing chance of merge conflict and time spent)

method 2 (two commits): one includes feature X, then two includes refactoring and test for feature X

pros:

  • Easier to code review both. (Refactoring done only for the sake of testing is kept with the test it is associated with)
  • You can cherry-pick just the feature. (e.g. for experiments or adding feature to old releases)
  • If you decide to revert the feature, you can keep the (hopefully) better structured code that came from the refactoring (However, revert will not be "pure". See cons below)

cons:

  • There will be a commit without test coverage (even though it's added immediately after, philosophically bad?)
  • Having a commit without test coverage makes automated coverage enforcement hard/impossible for every commit (e.g. you need y% coverage to merge)
  • If you cherry-pick only the test, it will fail.
  • Adds load to people wanting to do revert. (They needed to either know to revert both commits or remove the test as part of the feature revert making the revert not "pure")

method 3 (two commits): one includes refactoring, two includes feature X and test for feature X

pros:

  • Easier to code review the second commit. (Refactoring done only for the sake of testing is kept out of feature commit)
  • If you cherry-pick / revert either without merge conflict, it should compile & pass tests
  • There is never code not covered by test (both philosophically good and also easier for automated coverage enforcement)

cons:

  • Harder to code review the first commit. (If the only value of the refactoring is for test, and the test are in a future commit, you need to go back and forth between the two to understand why it was done and if it could have been done better.)
    • Arguably the worst of the 3 for "logically related code kept together" (but probably not that important???)

So based on all this, I'm leaning towards 3. Having the automated test coverage is a big win (and it what started me down this rabbit hole in the first place). But maybe one of you has pros/cons I missed? Or maybe there's a 4th options?

Best Answer

When working on existing code, it's common that you need to refactor the code before you can implement your feature.

This is the mantra from Kent Beck: "Make the change easy (warning: this may be hard), then make the easy change"

To do so, I usually recommend to do frequent little commits. Take baby steps. Refactor progressively:

multiple many commits when working on Legacy Code

Each refactoring doesn't change the way the code works, but how it's implemented. It's not "hard to review" since both implementation are equally valid. But the new implementation will make it easier for the change to be made.

Finally, write the test and make it pass. It should be relatively short and to the point. That also makes the commit easier to read.

Therefore I'd go for the 3rd option too. Maybe I'd even have multiple refactoring commits. Or I'd squash them into one before pushing that for review, so there's only one. Or maybe I'd do a first PR that's only refactoring, then a second that's only the feature. It really depends on how much refactoring is needed (keep your PRs short) and your team conventions!

If the only value of the refactoring is for test, and the test are in a future commit, you need to go back and forth between the two to understand why it was done and if it could have been done better

To solve this problem, you need to get your team comfortable in this approach: refactor first, then implement the feature.

I'd suggest you to discuss it with your colleagues and try that out. I'd also recommend you try to practice "over-committing" to get you in the habit of doing smaller commits. It's a useful skill to have when code is tricky, so it's a great exercise to do when code is not!

In any case, I think you've healthy discussion with your colleagues. No doubt you'll find what works for your team!

Related Topic