Testing – How to Test Code That Depends on Complex APIs Like Amazon S3?

mockingtesting

I am struggling with testing a method that uploads documents to Amazon S3, but I think this question applies to any non-trivial API/external dependecy. I've only come up with three potential solutions but none seem satisfactory:

  1. Do run the code, actually upload the document, check with AWS's API that it has been uploaded and delete it at the end of the test. This will make the test very slow, will cost money every time the test is run and won't alway return the same result.

  2. Mock S3. This is super hairy because I have no idea about that object's internals and it feels wrong because it's way too complicated.

  3. Just make sure that MyObject.upload() is called with the right arguments and trust that I am using the S3 object correctly. This bothers me because there is no way to know for sure I used the S3 API correctly from the tests alone.

I checked how Amazon tests their own SDK and they do mock everything. They have a 200 lines helper that does the mocking. I don't feel it's practical for me to do the same.

How do I solve this?

Best Answer

There are two issues we have to look at here.

The first is that you seem to be looking at all of your tests from the unit test perspective. Unit tests are extremely valuable, but are not the only kinds of tests. Tests can actually be divided into several different layers, from very fast unit tests to less fast integration tests to even slower acceptance tests. (There can be even more layers broken out, like functional tests.)

The second is that you are mixing together calls to third-party code with your business logic, creating testing challenges and possibly making your code more brittle.

Unit tests should be fast and should be run often. Mocking dependencies helps to keep these tests running fast, but can potentially introduce holes in coverage if the dependency changes and the mock doesn't. Your code could be broken while your tests still run green. Some mocking libraries will alert you if the dependency's interface changes, others cannot.

Integration tests, on the other hand, are designed to test the interactions between components, including third-party libraries. Mocks should not be used at this level of testing because we want to see how the actual object interact together. Because we are using real objects, these tests will be slower, and we will not run them nearly as often as our unit tests.

Acceptance tests look at an even higher level, testing that the requirements for the software are met. These tests run against the entire, complete system that would get deployed. Once again, no mocking should be used.

One guideline people have found valuable regarding mocks is to not mock types you don't own. Amazon owns the API to S3 so they can make sure it doesn't change beneath them. You, on the other hand, do not have these assurances. Therefore, if you mock out the S3 API in your tests, it could change and break your code, while your tests all show green. So how do we unit test code that uses third-party libraries?

Well, we don't. If we follow the guideline, we can't mock objects we don't own. But… if we own our direct dependencies, we can mock them out. But how? We create our own wrapper for the S3 API. We can make it look a lot like the S3 API, or we can make it fit our needs more closely (preferred). We can even make it a little more abstract, say a PersistenceService rather than an AmazonS3Bucket. PersistenceService would be an interface with methods like #save(Thing) and #fetch(ThingId), the types of methods we might like to see (these are examples, you might actually want different methods). We can now implement a PersistenceService around the S3 API (say a S3PersistenceService), encapsulating it away from our calling code.

Now to the code that calls the S3 API. We need to replace those calls with calls to a PersistenceService object. We use dependency injection to pass our PersistenceService into the object. It's important not to ask for a S3PersistenceService, but to ask for a PersistenceService. This allows us to swap out the implementation during our tests.

All the code that used to use the S3 API directly now uses our PersistenceService, and our S3PersistenceService now makes all the calls to the S3 API. In our tests, we can mock out PersistenceService, since we own it, and use the mock to make sure that our code makes the correct calls. But now that leaves how to test S3PersistenceService. It has the same problem as before: we can't unit test it without calling to the external service. So… we don't unit test it. We could mock out the S3 API dependencies, but this would give us little-to-no additional confidence. Instead, we have to test it at a higher level: integration tests.

This may sound a little troubling saying that we shouldn't unit test a part of our code, but let's look at what we accomplished. We had a bunch of code all over the place we couldn't unit test that now can be unit tested through the PersistenceService. We have our third-party library mess confined to a single implementation class. That class should provide the necessary functionality to use the API, but does not have any external business logic attached to it. Therefore, once it is written, it should be very stable and should not change very much. We can rely on slower tests that we don't run that often because the code is stable.

The next step is to write the integration tests for S3PersistenceService. These should be separated out by name or folder so we can run them separately from our fast unit tests. Integration tests can often use the same testing frameworks as unit tests if the code is sufficiently informative, so we don't need to learn a new tool. The actual code to the integration test is what you would write for your Option 1.