Disadvantages of Self-Encapsulation in Java

code-reuseencapsulationjavanullobject-oriented

Background

Tony Hoare's billion dollar mistake was the invention of null. Subsequently, a lot of code has become riddled with null pointer exceptions (segfaults) when software developers try to use (dereference) uninitialized variables.

In 1989, Wirfs-Brock and Wikerson wrote:

Direct references to variables severely limit the ability of programmers to refine existing classes. The programming conventions described here structure the use of variables to promote reusable designs. We encourage users of all object-oriented languages to follow these conventions. Additionally, we strongly urge designers of object-oriented languages to consider the effects of unrestricted variable references on reusability.

Problem

A lot of software, especially in Java, but likely in C# and C++, often uses the following pattern:

public class SomeClass {
  private String someAttribute;

  public SomeClass() {
    this.someAttribute = "Some Value";
  }

  public void someMethod() {
    if( this.someAttribute.equals( "Some Value" ) ) {
      // do something...
    }
  }

  public void setAttribute( String s ) {
    this.someAttribute = s;
  }

  public String getAttribute() {
    return this.someAttribute;
  }
}

Sometimes a band-aid solution is used by checking for null throughout the code base:

  public void someMethod() {
    assert this.someAttribute != null;

    if( this.someAttribute.equals( "Some Value" ) ) {
      // do something...
    }
  }

  public void anotherMethod() {
    assert this.someAttribute != null;

    if( this.someAttribute.equals( "Some Default Value" ) ) {
      // do something...
    }
  }

The band-aid does not always avoid the null pointer problem: a race condition exists. The race condition is mitigated using:

  public void anotherMethod() {
    String someAttribute = this.someAttribute;
    assert someAttribute != null;

    if( someAttribute.equals( "Some Default Value" ) ) {
      // do something...
    }
  }

Yet that requires two statements (assignment to local copy and check for null) every time a class-scoped variable is used to ensure it is valid.

Self-Encapsulation

Ken Auer's Reusability Through Self-Encapsulation (Pattern Languages of Program Design, Addison Wesley, New York, pp. 505-516, 1994) advocated self-encapsulation combined with lazy initialization. The result, in Java, would resemble:

public class SomeClass {
  private String someAttribute;

  public SomeClass() {
    setAttribute( "Some Value" );
  }

  public void someMethod() {
    if( getAttribute().equals( "Some Value" ) ) {
      // do something...
    }
  }

  public void setAttribute( String s ) {
    this.someAttribute = s;
  }

  public String getAttribute() {
    String someAttribute = this.someAttribute;

    if( someAttribute == null ) {
      someAttribute = createDefaultValue();
      setAttribute( someAttribute );
    }

    return someAttribute;
  }

  protected String createDefaultValue() { return "Some Default Value"; }
}

All duplicate checks for null are superfluous: getAttribute() ensures the value is never null at a single location within the containing class.

Efficiency arguments should be fairly moot — modern compilers and virtual machines can inline the code when possible.

As long as variables are never referenced directly, this also allows for proper application of the Open-Closed Principle.

Question

What are the disadvantages of self-encapsulation, if any?

(Ideally, I would like to see references to studies that contrast the robustness of similarly complex systems that use and don't use self-encapsulation, as this strikes me as a fairly straightforward testable hypothesis.)

Best Answer

The disadvantages are the inefficiency of the extra indirection, as you pointed out, and the fact that the compiler doesn't enforce it. All it takes is your worst programmer using one unencapsulated reference to destroy the benefits.

Also, the right way to solve a null pointer problem isn't to replace it with a non-null default value with essentially the same characteristics. The problem with null pointer dereferences isn't that they cause a segfault. That's just a symptom. The problem is that the programmer might not always handle an unexpected default/uninitialized value. That problem still must be handled separately with your self-encapsulation pattern.

The right way to solve a null pointer problem is to not create the object until a semantically valid non-null value can be put into the attribute, and to destroy the object before it is necessary to set any of its attributes to null. If there is never the possibility for a pointer to be null, there is never a need to check it.

Usually when people think an attribute must be null, they are trying to do too much in one class. It often makes the code much cleaner to split it into two classes. You can also split functions to avoid null assignments. Here's an example from another question where I refactored a function to avoid a problematic null assignment.