Java Builder Pattern – When to Fail?

design-patternsjava

When implementing the Builder Pattern, I often find myself confused with when to let building fail and I even manage to take different stands on the matter every few days.

First some explanation:

With failing early I mean that building an object should fail as soon as an invalid parameter is passed in. So inside the SomeObjectBuilder.
With failing late I mean that building an object only can fail on the build() call that implicitely calls a constructor of the object to be built.

Then some arguments:

In favor of failing late: A builder class should be no more than a class that simply holds values. Moreover, it leads to less code duplication.
In favor of failing early: A general approach in software programming is that you want to detect issues as early as possible and therefore the most logical place to check would be in the builder class' constructor, 'setters' and ultimately in the build method.

What is the general concensus about this?

Best Answer

Let's look at the options, where we can place the validation code:

Inside the setters in builder.
Inside the build() method.
Inside the constructed entity: it will be invoked in build() method when the entity is being created.

Option 1 allows us to detect problems earlier, but there can be complicated cases when we can validate input only having the full context, thus, doing at least part of validation in build() method. Thus, choosing option 1 will lead to inconsistent code with part of validation being done in one place and another part being done in other place.

Option 2 isn't significantly worse than option 1, because, usually, setters in builder are invoked right before the build(), especially, in fluent interfaces. Thus, it's still possible to detect a problem early enough in most cases. However, if the builder is not the only way to create an object, it will lead to duplication of validation code, because you'll need to have it everywhere where you create an object. The most logical solution in this case will be to put validation as close to created object as possible, that is, inside of it. And this is the option 3.

From SOLID point of view, putting validation in builder also violates SRP: the builder class already has responsibility of aggregating the data to construct an object. Validation is establishing contracts on its own internal state, it's a new responsibility to check the state of another object.

Thus, from my point of view, not only it's better to fail late from design perspective, but it's also better to fail inside the constructed entity, rather than in builder itself.

UPD: this comment reminded me of one more possibility, when validation inside the builder (option 1 or 2) makes sense. It does make sense if the builder has its own contracts on the objects it is creating. For example, assume that we have a builder that constructs a string with specific content, say, list of number ranges 1-2,3-4,5-6. This builder may have a method like addRange(int min, int max). The resulting string does not know anything about these numbers, neither it should have to know. The builder itself defines the format of the string and constraints on the numbers. Thus, the method addRange(int,int) must validate the input numbers and throw an exception if max is less than min.

That said, the general rule will be to validate only the contracts defined by the builder itself.

Related Solutions

Builder Design Pattern – Is StringBuilder an Application?

A StringBuilder is similar to a the Builder Pattern, but does not share much with the GoF description of this design pattern. The original point of the design pattern was

Separate the construction of a complex object from its representation so that the same construction process can create different representations.

— from Design Patterns, by Gamma, Helm, Johnson, Vlissides.

(note: “complex” primarily means “composed of multiple parts”, not necessarily “complicated” or “difficult”)

The “different representations” is key here. E.g. assuming this construction process:

interface ArticleBuilder {
  void addTitle(String title);
  void addParagraph(String paragraph);
}

void createArticle(ArticeBuilder articleBuilder) {
  articleBuilder.addTitle("Is String Builder an application of ...");
  articleBuilder.addParagraph("Is the Builder Pattern restricted...");
  articleBuilder.addParagraph("The StringBuilder class ...");
}

we might end up with a HtmlDocument or a TexDocument or a MarkdownDocument depending on what concrete implementation is provided:

class HtmlDocumentBuilder implements ArticleBuilder {
  ...
  HtmlDocument getResult();
}

HtmlDocumentBuilder b = new HtmlDocumentBuilder();
createArticle(b);
HtmlDocument dom = b.getResult();

So one central point of the Builder pattern is polymorphism. The Design Patterns book compares this pattern to the Abstract Factory:

Abstract Factory is similar to the Builder in that it too may construct complex objects. The primary difference is that the Builder pattern focuses on constructing a complex object step by step. […] Builder returns the product as a final step, but as far as the Abstract Factory is concerned, the product gets returned immediately.

— from Design Patterns, by Gamma, Helm, Johnson, Vlissides.

This step-by-step aspect has then become the more popular aspect of the Builder pattern, so that in common parlance the Builder pattern is understood like this:

Split construction of an object into multiple steps. This allows us to use named arguments or optional parameters even in languages that do not support these features.

Wikipedia defines the pattern like this:

The builder pattern is an object creation software design pattern. Unlike the abstract factory pattern and the factory method pattern whose intention is to enable polymorphism, the intention of the builder pattern is to find a solution to the telescoping constructor anti-pattern^{[citation needed]}. […]

The builder pattern has another benefit. It can be used for objects that contain flat data (html code, SQL query, X.509 certificate...), that is to say, data that can't be easily edited. This type of data cannot be edited step by step and must be edited at once. The best way to construct such an object is to use a builder class.^{[citation needed]}

— from Builder Pattern on Wikipedia, by various contributors.

So as we can see, there is no truly common understanding of which pattern this name refers to, and in some points different definitions even contradict one another (e.g. regarding the relevance of polymorphism for Builders).

The only common property of the StringBuilder with various interpretations of the pattern is that the product is created step by step rather than in one go. It does not meet a strict reading of the GoF definition of the design pattern, but please note that design patterns are malleable concepts meant to facilitate communication. I would continue to call StringBuilder an example of the Builder Pattern, albeit an atypical one – the main reason for that structure in Java is performant concatenation in the presence of immutable strings, but not some interesting object-oriented design.

Constructor with tons of parameters vs builder pattern

The Builder Pattern does not solve the “problem” of many arguments. But why are many arguments problematic?

They indicate your class might be doing too much. However, there are many types that legitimately contain many members that cannot be sensibly grouped.
Testing and understanding a function with many inputs gets exponentially more complicated – literally!
When the language does not offer named parameters, a function call is not self-documenting. Reading a function call with many arguments is quite difficult because you have no idea what the 7th parameter is supposed to do. You wouldn't even notice if the 5th and 6th argument were swapped accidentally, especially if you're in a dynamically typed language or everything happens to be a string, or when the last parameter is true for some reason.

Faking named parameters

The Builder Pattern addresses only one of these problems, namely the maintainability concerns of function calls with many arguments^∗. So a function call like

MyClass o = new MyClass(a, b, c, d, e, f, g);

might become

MyClass o = MyClass.builder()
  .a(a).b(b).c(c).d(d).e(e).f(f).g(g)
  .build();

^{∗ The Builder pattern was originally intended as a representation-agnostic approach to assemble composite objects, which is a far greater aspiration than just named arguments for parameters. In particular, the builder pattern does not require a fluent interface.}

This offers a bit of extra safety since it will blow up if you invoke a builder method that doesn't exist, but it otherwise does not bring you anything that a comment in the constructor call wouldn't have. Also, manually creating a builder requires code, and more code can always contain more bugs.

In languages where it is easy to define a new value type, I've found that it's way better to use microtyping/tiny types to simulate named arguments. It is named so because the types are really small, but you end up typing a lot more ;-)

MyClass o = new MyClass(
  new MyClass.A(a), new MyClass.B(b), new MyClass.C(c),
  new MyClass.D(d), new MyClass.E(e), new MyClass.F(f),
  new MyClass.G(g));

Obviously, the type names A, B, C, … should be self-documenting names that illustrate the meaning of the the parameter, often the same name as you'd give the parameter variable. Compared with the builder-for-named-arguments idiom, the required implementation is a lot simpler, and thus less likely to contain bugs. For example (with Java-ish syntax):

class MyClass {
  ...
  public static class A {
    public final int value;
    public A(int a) { value = a; }
  }
  ...
}

The compiler helps you guarantee that all arguments were provided; with a Builder you'd have to manually check for missing arguments, or encode a state machine into the host language type system – both would likely contain bugs.

There is another common approach to simulate named arguments: a single abstract parameter object that uses an inline class syntax to initialize all fields. In Java:

MyClass o = new MyClass(new MyClass.Arguments(){{ argA = a; argB = b; argC = c; ... }});

class MyClass {
  ...
  public static abstract class Arguments {
    public int argA;
    public String ArgB;
    ...
  }
}

However, it is possible to forget fields, and this is a quite language-specific solution (I've seen uses in JavaScript, C#, and C).

Fortunately, the constructor can still validate all arguments, which is not the case when your objects are created in a partially-constructed state, and require the user to provide further arguments via setters or an init() method – those require the least coding effort, but make it more difficult to write correct programs.

So while there are many approaches to address the “many unnamed parameters make code difficult to maintain problem”, other problems remain.

Approaching the root problem

For example the testability problem. When I write unit tests, I need the ability to inject test data, and to provide test implementations to mock out dependencies and operations that have external side effects. I can't do that when you instantiate any classes within your constructor. Unless the responsibility of your class is the creation of other objects, it shouldn't instantiate any non-trivial classes. This goes hand in hand with the single responsibility problem. The more focussed the responsibility of a class, the easier it is to test (and often easier to use).

The easiest and often best approach is for the constructor to take fully-constructed dependencies as parameter, though this shoves the responsibility of managing dependencies to the caller – not ideal either, unless the dependencies are independent entities in your domain model.

Sometimes (abstract) factories or full dependency injection frameworks are used instead, though these might be overkill in the majority of use cases. In particular, these only reduce the number of arguments if many of these arguments are quasi-global objects or configuration values that don't change between object instantiation. E.g. if parameters a and d were global-ish, we'd get

Dependencies deps = new Dependencies(a, d);
...
MyClass o = deps.newMyClass(b, c, e, f, g);

class MyClass {
  MyClass(Dependencies deps, B b, C c, E e, F f, G g) {
    this.depA = deps.newDepA(b, c);
    this.depB = deps.newDepB(e, f);
    this.g = g;
  }
  ...
}

class Dependencies {
  private A a;
  private D d;
  public Dependencies(A a, D d) { this.a = a; this.d = d; }
  public DepA newDepA(B b, C c) { return new DepA(a, b, c); }
  public DepB newDepB(E e, F f) { return new DepB(d, e, f); }
  public MyClass newMyClass(B b, C c, E e, F f, G g) {
    return new MyClass(deps, b, c, e, f, g);
  }
}

Depending on the application, this might be a game-changer where the factory methods end up having nearly no arguments because all can be provided by the dependency manager, or it might be a large amount of code that complicates instantiation for no apparent benefit. Such factories are way more useful for mapping interfaces to concrete types than they are for managing parameters. However, this approach tries to addresses the root problem of too many parameters rather than just hiding it with a pretty fluent interface.