Java – How to handle “conditional fields” in Java

builder-patternjavapatterns-and-practices

I've run into several situations where a POJO where whether a field value is meaningful depends on the value of another field. An example, using Lombok (which we try to use to avoid boilerplate):

@Data
@Builder
public class SomePojo {
    // an enum whose values are DEFAULT_LOCATION, STANDARD_INPUT, FILE
    private final SourceType sourceType;

    // meaningful only if sourceType == FILE
    private final String path;

    private final String attributes;  
}

(This is just an example; the case I'm now looking at has nothing to do with files. If it makes a difference, the class's purpose is to be the return type of a method that needs to return several pieces of information to the caller.)

Although the class could be used as is, what are the best practices for dealing with this situation? Specifically:

  1. Should I write a custom getter method for path that throws an exception if sourceType != FILE?

  2. Should I write a custom builder that throws an exception if the builder tries to set path when it sets sourceType to something other than FILE?

  3. Should I write custom equals() and hashCode() that don't look at path if sourceType != FILE?

  4. Should my builder or constructor set path to some fixed value if sourceType != FILE? Doing so would eliminate the need for special equals() and hashCode().

  5. Should I make path an Optional<String>? Would it be enough to do this without doing any of #1-4?

  6. Is it preferable to define a new class hierarchy to encapsulate the sourceType and path fields (so there would be three subclasses of some base class and only one of them would have a path)? Let's assume that there aren't any polymorphic methods in this hierarchy.

MORE: I agree with the comment that if a method returns a variant record, there's a good chance that the method is doing too much, and that needs to be checked. But this isn't always the case. A very simple example would be a method that searches a string to see if it contains a substring, and returns the position of the substring if present. (Java's method to do this returns a special value as the "position" if the substring isn't present, which I'd argue is a bad practice because it can introduce errors if a caller fails to check this case and treats the resulting position as a number. Optional would help in this case, but not in a case where there are more than two states to return.) This seems to be a natural use case for returning either (FOUND, position) or (NOT_FOUND) which wouldn't have a position. The case I'm working with isn't quite that simple, but it's been examined thoroughly and has already been torn up a couple of times by colleagues in design reviews, so I'm pretty sure that it's not a method that does too much.

Best Answer

tl;dr As others have said, #6 is, design-wise, the best approach, and likely the best approach on other axes as well. The remainder of this is presenting various ways of doing #6 and how you might choose between them.

In a language with algebraic data types (or similar) such as most popular statically typed functional languages (Haskell, SML, O'Caml, F#) but also Scala, Swift, Rust, it would be straightforward to represent such a type. In Haskell syntax:

data Source
  = StdIn { attributes :: String }
  | DefaultLocation { attributes :: String }
  | File { attributes :: String, path :: String }

This can be encoded in virtually all object-oriented languages using dynamic dispatch:

interface Source {
    string getAttributes();
}
class StdInSource implements Source {
    private string attributes;
    StdIn(string attributes) { this.attributes = attributes; }
    string getAttributes() { return this.attributes; }
}
// Similarly for DefaultLocationSource and FileSource, the latter taking also the path

Of course, as is, the only thing you can do given a Source is getAttributes and you probably want to do more than that. With algebraic data types, you could pattern match to recover the information in each case and dispatch on which case. This can be captured in the Java rendition:

interface Source {
    <A> A match(Function<string, A> stdInCase, 
                Function<string, A> defaultLocationCase,
                BiFunction<string, string, A> fileCase);
}

// With now code like:
class StdInSource implements Source {
    // same as before
    <A> A match(Function<string, A> stdInCase, 
                Function<string, A> defaultLocationCase,
                BiFunction<string, string, A> fileCase) {
        return stdInCase.apply(this.attributes);
    }
}

This approach more or less captures the behavior of the algebraic data type, but it also captures some of the downsides. The match method is basically the accept method of the Visitor pattern. If we bundled all those Functions together into a class instead of having a bunch of parameters, the resulting object would basically be a visitor. The Visitor pattern leads to some constraints on extensibility, roughly corresponding to the closed nature of algebraic data types.

In the ideal case for object-oriented programming, there would be some general service you want provided by Source and any implementors of the interface would be implementation details that don't matter to you. Let's say you only care about getting an input stream from these sources, then you can easily make an interface like:

interface Source {
    InputStream getStream();
}

With such an interface, it's completely straightforward to add an additional option, say, for reading from an in-memory source. If you need to know whether a Source is a file or not so you know to close the file or something, that decision should be pushed into the objects by adding a close method to Source interface and letting the implementors decide what to do with it. Standard OOP advice is to always off-load these decisions onto objects and this is often good advice. It can become absurd though. Are you going to make a subclass of Boolean every time you do an if?

Roughly speaking, my viewpoint is that some objects are "data". These should be "value objects"; immutable, inert, and mostly transparent. Other objects represent "services" and should be active, encapsulated, and often will be stateful. In my opinion, OOP practice largely focuses on this latter perspective. The algebraic data type approach is more appropriate for "data"-y objects, and often inappropriate for "service"-y objects. On the other hand, pushing decisions (and behavior) onto "data"-y objects is often the opposite of what you want to do. In this vein, it's interesting to see what an FP approach to extensibility would be.

The key thing is "data" doesn't have behavior. How a piece of data is interpreted is up to the consumer. Just because I have File { attributes = "", path = "foo" } doesn't mean a consumer has to open file "foo". That may be the intended, "standard" interpretation but other interpretations can exist side-by-side. It is a common theme in FP practice to use data to describe what you want to do, and then only later interpret it. One benefit of this approach is it becomes very easy to do "global" transformations/analyses of "the plan". So the "plan" is "data"-y, but the interpreter of "the plan" is "service"-y. Indeed, it's easy to have a nice narrow, generic interface a la the getStream version of Source; namely, interpret(Plan thePlan).

Data-oriented FP techniques are good for building/transforming/optimizing/analyzing a description of what to do which can then be used to orchestrate the behavior of active objects implemented with service-oriented OOP techniques. This frees the active objects from having to learn about their context are react appropriately which is often tricky and modularity-destroying code. As an example, it's very easy to take a list of (descriptions of) transformations representing a pipeline and recognize where certain transformations can be combined together or cancelled out. It's very difficult, on the other hand, for a particular (instance of a) transformation to tell that it actually cancels out the work of its upstream transformation, and it and the upstream transformation should be removed.