This paper demonstrates that TDD adds 15-35% development time in return for a 40-90% reduction in defect density on otherwise like-for-like projects.
The article refers full paper (pdf) - Nachiappan Nagappan, E. Michael Maximilien, Thirumalesh Bhat, and Laurie Williams. “Realizing quality improvement through test driven development: results and experiences of four industrial teams“. ESE 2008.
Abstract Test-driven development (TDD) is a software development practice that has been
used sporadically for decades.With this practice, a software engineer cycles minute-by-minute
between writing failing unit tests and writing implementation code to pass those tests. Testdriven
development has recently re-emerged as a critical enabling practice of agile software
development methodologies. However, little empirical evidence supports or refutes the utility
of this practice in an industrial context. Case studies were conducted with three development
teams at Microsoft and one at IBM that have adopted TDD. The results of the case studies
indicate that the pre-release defect density of the four products decreased between 40% and
90% relative to similar projects that did not use the TDD practice. Subjectively, the teams
experienced a 15–35% increase in initial development time after adopting TDD.
Full paper also briefly summarizes the relevant empirical studies on TDD and their high level results (section 3 Related Works), including George and Williams 2003, Müller and Hagner (2002), Erdogmus et al. (2005) , Müller and Tichy (2001), Janzen and Seiedian (2006).
There are already lots of good answers to this question, and I've commented and upvoted several of them. Still, I'd like to add some thoughts.
Flexibility isn't for novices
The OP clearly states that he's not experienced with TDD, and I think a good answer must take that into account. In the terminology of the Dreyfus model of skill acquisition, he's probably a Novice. There's nothing wrong with being a novice - we are all novices when we start learning something new. However, what the Dreyfus model explains is that novices are characterized by
- rigid adherence to taught rules or plans
- no exercise of discretionary judgement
That's not a description of a personality deficiency, so there's no reason to be ashamed of that - it's a stage we all need to go through in order to learn something new.
This is also true for TDD.
While I agree with many of the other answers here that TDD doesn't have to be dogmatic, and that it can sometimes be more beneficial to work in an alternative way, that doesn't help anyone just starting out. How can you exercise discretionary judgement when you have no experience?
If a novice accepts the advice that sometimes it's OK not to do TDD, how can he or she determine when it's OK to skip doing TDD?
With no experience or guidance, the only thing a novice can do is to skip out of TDD every time it becomes too difficult. That's human nature, but not a good way to learn.
Listen to the tests
Skipping out of TDD any time it becomes hard is to miss out of one of the most important benefits of TDD. Tests provide early feedback about the API of the SUT. If the test is hard to write, it's an important sign that the SUT is hard to use.
This is the reason why one of the most important messages of GOOS is: listen to your tests!
In the case of this question, my first reaction when seeing the proposed API of the Yahtzee game, and the discussion about combinatorics that can be found on this page, was that this is important feedback about the API.
Does the API have to represent dice rolls as an ordered sequence of integers? To me, that smell of Primitive Obsession. That's why I was happy to see the answer from tallseth suggesting the introduction of a Roll
class. I think that's an excellent suggestion.
However, I think that some of the comments to that answer get it wrong. What TDD then suggests is that once you get the idea that a Roll
class would be a good idea, you suspend work on the original SUT and start working on TDD'ing the Roll
class.
While I agree that TDD is more aimed at the 'happy path' than it's aimed at comprehensive testing, it still helps to break the system down into manageable units. A Roll
class sounds like something you could TDD to completion much more easily.
Then, once the Roll
class is sufficiently evolved, would you go back to the original SUT and flesh it out in terms of Roll
inputs.
The suggestion of a Test Helper doesn't necessarily imply randomness - it's just a way to make the test more readable.
Another way to approach and model input in terms of Roll
instances would be to introduce a Test Data Builder.
Red/Green/Refactor is a three-stage process
While I agree with the general sentiment that (if you are sufficiently experienced in TDD), you don't need to stick to TDD rigorously, I think it's pretty poor advice in the case of a Yahtzee exercise. Although I don't know the details of the Yahtzee rules, I see no convincing argument here that you can't stick rigorously with the Red/Green/Refactor process and still arrive at a proper result.
What most people here seem to forget is the third stage of the Red/Green/Refactor process. First you write the test. Then you write the simplest implementation that passes all tests. Then you refactor.
It's here, in this third state, that you can bring all your professional skills to bear. This is where you are allowed to reflect on the code.
However, I think it's a cop-out to state that you should only "Write the simplest thing possible that isn't completely braindead and obviously incorrect that works". If you (think you) know enough about the implementation on beforehand, then everything short of the complete solution is going to be obviously incorrect. As far as advice goes, then, this is pretty useless to a novice.
What really should happen is that if you can make all tests pass with an obviously incorrect implementation, that's feedback that you should write another test.
It's surprising how often doing that leads you towards an entirely different implementation than the one you had in mind first. Sometimes, the alternative that grows like that may turn out to be better than your original plan.
Rigour is a learning tool
It makes a lot of sense to stick with rigorous processes like Red/Green/Refactor as long as one is learning. It forces the learner to gain experience with TDD not just when it's easy, but also when it's hard.
Only when you have mastered all the hard parts are you in a position to make an informed decision on when to deviate from the 'true' path. That's when you start forming your own path.
Best Answer
First, the best approach may be to avoid running into this situation at first hand, and refactor earlier. Often people forget that refactoring should be done rigorously in each TDD cycle, and design flaws should be removed as soon as they become apparent. Given there are so many tests as you said, I am pretty sure the design flaw could have been spotted much earlier, at a point in time when there existed a lot fewer tests.
But what can you do if you have already painted yourself into that corner? The problem here is usually the amount of test code which uses the public-facing API of the Subject-Under-Test, which is what you want to change. So as an approach to resolve that problem,
try to reduce the number of direct test calls to the public API by refactoring the tests themselves first and make them more DRY.
build an anti-corruption layer, or a facade, or a proxy between your tests and the SUT, so you can change the API of the SUT without having to change too many parts of your tests. That will allow you to keep the tests as they are for now. Later, when you have some time for cleaning up, you may decide to migrate the tests to the new API one-by-one.
The latter approach is also known as strangler pattern and can often be used to gradually swap out legacy components by components with a new design, not only for tests.
As an example, if you have 50 tests calling the same public method of a class (and maybe only one or two places in your production code calling the same method), then the tests seem to hinder you changing the signature of that method. But if most of the direct calls inside the test code are rerouted through a single helper method, maybe one which also does parts of the arrange-act-assert work, it will become a lot simpler to change the method signature of the SUT.