There are already lots of good answers to this question, and I've commented and upvoted several of them. Still, I'd like to add some thoughts.
Flexibility isn't for novices
The OP clearly states that he's not experienced with TDD, and I think a good answer must take that into account. In the terminology of the Dreyfus model of skill acquisition, he's probably a Novice. There's nothing wrong with being a novice - we are all novices when we start learning something new. However, what the Dreyfus model explains is that novices are characterized by
- rigid adherence to taught rules or plans
- no exercise of discretionary judgement
That's not a description of a personality deficiency, so there's no reason to be ashamed of that - it's a stage we all need to go through in order to learn something new.
This is also true for TDD.
While I agree with many of the other answers here that TDD doesn't have to be dogmatic, and that it can sometimes be more beneficial to work in an alternative way, that doesn't help anyone just starting out. How can you exercise discretionary judgement when you have no experience?
If a novice accepts the advice that sometimes it's OK not to do TDD, how can he or she determine when it's OK to skip doing TDD?
With no experience or guidance, the only thing a novice can do is to skip out of TDD every time it becomes too difficult. That's human nature, but not a good way to learn.
Listen to the tests
Skipping out of TDD any time it becomes hard is to miss out of one of the most important benefits of TDD. Tests provide early feedback about the API of the SUT. If the test is hard to write, it's an important sign that the SUT is hard to use.
This is the reason why one of the most important messages of GOOS is: listen to your tests!
In the case of this question, my first reaction when seeing the proposed API of the Yahtzee game, and the discussion about combinatorics that can be found on this page, was that this is important feedback about the API.
Does the API have to represent dice rolls as an ordered sequence of integers? To me, that smell of Primitive Obsession. That's why I was happy to see the answer from tallseth suggesting the introduction of a Roll
class. I think that's an excellent suggestion.
However, I think that some of the comments to that answer get it wrong. What TDD then suggests is that once you get the idea that a Roll
class would be a good idea, you suspend work on the original SUT and start working on TDD'ing the Roll
class.
While I agree that TDD is more aimed at the 'happy path' than it's aimed at comprehensive testing, it still helps to break the system down into manageable units. A Roll
class sounds like something you could TDD to completion much more easily.
Then, once the Roll
class is sufficiently evolved, would you go back to the original SUT and flesh it out in terms of Roll
inputs.
The suggestion of a Test Helper doesn't necessarily imply randomness - it's just a way to make the test more readable.
Another way to approach and model input in terms of Roll
instances would be to introduce a Test Data Builder.
Red/Green/Refactor is a three-stage process
While I agree with the general sentiment that (if you are sufficiently experienced in TDD), you don't need to stick to TDD rigorously, I think it's pretty poor advice in the case of a Yahtzee exercise. Although I don't know the details of the Yahtzee rules, I see no convincing argument here that you can't stick rigorously with the Red/Green/Refactor process and still arrive at a proper result.
What most people here seem to forget is the third stage of the Red/Green/Refactor process. First you write the test. Then you write the simplest implementation that passes all tests. Then you refactor.
It's here, in this third state, that you can bring all your professional skills to bear. This is where you are allowed to reflect on the code.
However, I think it's a cop-out to state that you should only "Write the simplest thing possible that isn't completely braindead and obviously incorrect that works". If you (think you) know enough about the implementation on beforehand, then everything short of the complete solution is going to be obviously incorrect. As far as advice goes, then, this is pretty useless to a novice.
What really should happen is that if you can make all tests pass with an obviously incorrect implementation, that's feedback that you should write another test.
It's surprising how often doing that leads you towards an entirely different implementation than the one you had in mind first. Sometimes, the alternative that grows like that may turn out to be better than your original plan.
Rigour is a learning tool
It makes a lot of sense to stick with rigorous processes like Red/Green/Refactor as long as one is learning. It forces the learner to gain experience with TDD not just when it's easy, but also when it's hard.
Only when you have mastered all the hard parts are you in a position to make an informed decision on when to deviate from the 'true' path. That's when you start forming your own path.
It's comparing oranges and apples.
Integration tests, acceptance tests, unit tests, behaviour tests - they are all tests and they will all help you improve your code but they are also quite different.
I'm going to go over each of the different tests in my opinion and hopefully explain why you need a blend of all of them:
Integration tests:
Simply, test that different component parts of your system integrate correctly - for example - maybe you simulate a web service request and check that the result comes back. I would generally use real (ish) static data and mocked dependencies to ensure that it can be consistently verified.
Acceptance tests:
An acceptance test should directly correlate to a business use case. It can be huge ("trades are submitted correctly") or tiny ("filter successfully filters a list") - it doesn't matter; what matters is that it should be explicitly tied to a specific user requirement. I like to focus on these for test-driven development because it means we have a good reference manual of tests to user stories for dev and qa to verify.
Unit tests:
For small discrete units of functionality that may or may not make up an individual user story by itself - for example, a user story which says that we retrieve all customers when we access a specific web page can be an acceptance test (simulate hitting the web page and checking the response) but may also contain several unit tests (verify that security permissions are checked, verify that the database connection queries correctly, verify that any code limiting the number of results is executed correctly) - these are all "unit tests" that aren't a complete acceptance test.
Behaviour tests:
Define what the flow should be of an application in the case of a specific input. For example, "when connection cannot be established, verify that the system retries the connection." Again, this is unlikely to be a full acceptance test but it still allows you to verify something useful.
These are all in my opinion through much experience of writing tests; I don't like to focus on the textbook approaches - rather, focus on what gives your tests value.
Best Answer
The old tests are just fine for verifying that
calculateTax
still works as it should. However, you don't need many test cases for this, only 3 (or maybe some more, if you want to test error handling too, using unexpected values ofname
).Each of the individual cases (at the moment implemented in
doSomething
et al.) must have their own set of tests too, which test the inner details and special cases related to each implementation. In the new setup these tests could / should be converted into direct tests on the respective Strategy class.I prefer to remove old unit tests only if the code they exercise, and the functionality it implements, completely ceases to exist. Otherwise, the knowledge encoded into these tests is still relevant, only the tests need to be refactored themselves.
Update
There may be some duplication between the tests of
calculateTax
(let's call them high level tests) and the tests for the individual calculation strategies (low level tests) - it depends on your implementation.I guess the original implementation of your tests asserts the result of the specific tax calculation, implicitly verifying that the specific calculation strategy was used to produce it. If you keep this schema, you will have duplication indeed. However, as @Kristof hinted, you may implement the high level tests using mocks too, to verify only that the right kind of (mock) strategy was selected and invoked by
calculateTax
. In this case there will be no duplication between high and low level tests.So if refactoring the affected tests isn't too costly, I would prefer the latter approach. However, in real life, when doing some massive refactoring, I do tolerate a small amount of test code duplication if it saves me enough time :-)
Again, it depends. Note that the tests of
calculateTax
effectively test the factory. So if the factory code is a trivialswitch
block like your code above, these tests may be all you need. But if the factory does some more tricky things, you may want to dedicate some tests specifically for it. It all boils down to how much tests you need to be confident that the code in question really works. If, upon reading the code - or analysing code coverage data - you see untested execution paths, dedicate some more tests to exercise these. Then repeat this until you are fully confident in your code.