I've run into several situations where a POJO where whether a field value is meaningful depends on the value of another field. An example, using Lombok (which we try to use to avoid boilerplate):
@Data
@Builder
public class SomePojo {
// an enum whose values are DEFAULT_LOCATION, STANDARD_INPUT, FILE
private final SourceType sourceType;
// meaningful only if sourceType == FILE
private final String path;
private final String attributes;
}
(This is just an example; the case I'm now looking at has nothing to do with files. If it makes a difference, the class's purpose is to be the return type of a method that needs to return several pieces of information to the caller.)
Although the class could be used as is, what are the best practices for dealing with this situation? Specifically:
-
Should I write a custom getter method for
path
that throws an exception ifsourceType != FILE
? -
Should I write a custom builder that throws an exception if the builder tries to set
path
when it setssourceType
to something other thanFILE
? -
Should I write custom
equals()
andhashCode()
that don't look atpath
ifsourceType != FILE
? -
Should my builder or constructor set
path
to some fixed value ifsourceType != FILE
? Doing so would eliminate the need for specialequals()
andhashCode()
. -
Should I make path an
Optional<String>
? Would it be enough to do this without doing any of #1-4? -
Is it preferable to define a new class hierarchy to encapsulate the
sourceType
andpath
fields (so there would be three subclasses of some base class and only one of them would have apath
)? Let's assume that there aren't any polymorphic methods in this hierarchy.
MORE: I agree with the comment that if a method returns a variant record, there's a good chance that the method is doing too much, and that needs to be checked. But this isn't always the case. A very simple example would be a method that searches a string to see if it contains a substring, and returns the position of the substring if present. (Java's method to do this returns a special value as the "position" if the substring isn't present, which I'd argue is a bad practice because it can introduce errors if a caller fails to check this case and treats the resulting position as a number. Optional
would help in this case, but not in a case where there are more than two states to return.) This seems to be a natural use case for returning either (FOUND, position
) or (NOT_FOUND) which wouldn't have a position. The case I'm working with isn't quite that simple, but it's been examined thoroughly and has already been torn up a couple of times by colleagues in design reviews, so I'm pretty sure that it's not a method that does too much.
Best Answer
tl;dr As others have said, #6 is, design-wise, the best approach, and likely the best approach on other axes as well. The remainder of this is presenting various ways of doing #6 and how you might choose between them.
In a language with algebraic data types (or similar) such as most popular statically typed functional languages (Haskell, SML, O'Caml, F#) but also Scala, Swift, Rust, it would be straightforward to represent such a type. In Haskell syntax:
This can be encoded in virtually all object-oriented languages using dynamic dispatch:
Of course, as is, the only thing you can do given a
Source
isgetAttributes
and you probably want to do more than that. With algebraic data types, you could pattern match to recover the information in each case and dispatch on which case. This can be captured in the Java rendition:This approach more or less captures the behavior of the algebraic data type, but it also captures some of the downsides. The
match
method is basically theaccept
method of the Visitor pattern. If we bundled all thoseFunction
s together into a class instead of having a bunch of parameters, the resulting object would basically be a visitor. The Visitor pattern leads to some constraints on extensibility, roughly corresponding to the closed nature of algebraic data types.In the ideal case for object-oriented programming, there would be some general service you want provided by
Source
and any implementors of the interface would be implementation details that don't matter to you. Let's say you only care about getting an input stream from these sources, then you can easily make an interface like:With such an interface, it's completely straightforward to add an additional option, say, for reading from an in-memory source. If you need to know whether a
Source
is a file or not so you know to close the file or something, that decision should be pushed into the objects by adding aclose
method toSource
interface and letting the implementors decide what to do with it. Standard OOP advice is to always off-load these decisions onto objects and this is often good advice. It can become absurd though. Are you going to make a subclass ofBoolean
every time you do anif
?Roughly speaking, my viewpoint is that some objects are "data". These should be "value objects"; immutable, inert, and mostly transparent. Other objects represent "services" and should be active, encapsulated, and often will be stateful. In my opinion, OOP practice largely focuses on this latter perspective. The algebraic data type approach is more appropriate for "data"-y objects, and often inappropriate for "service"-y objects. On the other hand, pushing decisions (and behavior) onto "data"-y objects is often the opposite of what you want to do. In this vein, it's interesting to see what an FP approach to extensibility would be.
The key thing is "data" doesn't have behavior. How a piece of data is interpreted is up to the consumer. Just because I have
File { attributes = "", path = "foo" }
doesn't mean a consumer has to open file "foo". That may be the intended, "standard" interpretation but other interpretations can exist side-by-side. It is a common theme in FP practice to use data to describe what you want to do, and then only later interpret it. One benefit of this approach is it becomes very easy to do "global" transformations/analyses of "the plan". So the "plan" is "data"-y, but the interpreter of "the plan" is "service"-y. Indeed, it's easy to have a nice narrow, generic interface a la thegetStream
version ofSource
; namely,interpret(Plan thePlan)
.Data-oriented FP techniques are good for building/transforming/optimizing/analyzing a description of what to do which can then be used to orchestrate the behavior of active objects implemented with service-oriented OOP techniques. This frees the active objects from having to learn about their context are react appropriately which is often tricky and modularity-destroying code. As an example, it's very easy to take a list of (descriptions of) transformations representing a pipeline and recognize where certain transformations can be combined together or cancelled out. It's very difficult, on the other hand, for a particular (instance of a) transformation to tell that it actually cancels out the work of its upstream transformation, and it and the upstream transformation should be removed.