Class Design – Encapsulation vs Single Responsibility/Separation of Concerns

classdesignencapsulationseparation-of-concernssingle-responsibility

I'm working on a class that represents an object with multiple representations – one is an XML type representation used by an automatic ordering system, the other is a POJO-based representation used by a monitoring tool.

The problem I'm running into is that I seem to have to make trade-off between encapsulating the object's internal data and designing my application according to the principle of "separation of concerns" or "single responsibily per class".

Let's look at the class in more detail:

public class MyObject {
    private Object field1;
    private Object field2;
    private Object field3;
    ...
    private Object field100;
}

(Yes it's quite a big ugly monster, something I have no control over).

If I follow the principle of separation of concerns (the way I understand it anyway), I will have to add getters for every field in this class and have a separate XMLBuilder class for the XML representation, and a PojoBuilder for the Monitoring tool representation:

public class XMLBuilder {

    public XML asXml(MyObject obj) {
        return "<tag1>" + object.getField1() + "</tag1>"
               "<tag2>" + object.getField2() + "</tag2>"
               ...
               ;
        }
}

This seems to me to break the principle of encapsulation – I have to expose a lot of the internal data on my class just to provide a representation for it. If I pass my object to other systems, I have no control whether these sub-systems may become dependent on arbitrary fields (i.e. field45), which means that future changes to the structure of this object may become difficult as there are now other possible dependencies on it.

An alternative approach, advocated by Allen Holub (http://www.javaworld.com/article/2073723/core-java/why-getter-and-setter-methods-are-evil.html) for example, is to do the representation inside the class:

public class MyObject {

    ...

    public XML asXml() {
        return "<tag1>" + field1 + "</tag1>"
               "<tag2>" + field2 + "</tag2>"
               ...
               ;
    }

    public Object asMonitor() {
        ...
    }

 }

To me, this solution seems great, as the class never exposes any private fields. The representation generators have access to all the information it needs, and the object itself can be safely passed to other sub-systems without the worry that another module would depend on internals.

Now, when I implement this, my colleagues all raise the issue of "Separation of concerns". The problem, they say, is that the MyObject class now have knowledge of things like XML representations and POJO represantions, which means that if the XML representation changes, you now have to make modifications inside the actual class itself, instead of just updating an "XML ruleset" or some other external configuration.

Holub's solution is to use a Builder pattern:

public class MyObject {

...

    public void buildContent(MyObjectBuilder builder) {
        builder.setField1(field1);
        builder.setField2(field2);
    }
 }

 public class XMLBuilder implements MyObjectBuilder {

     ...

     public void setField1(Object field1) {
         content += "<tag1>" + field1 + "</tag1>";
     }

     public XML asXML() {
         return content;
     }

     ...
 }

but this seems hardly better than the first approach – instead of exposing data as getters, it is now implicitly exposed via a builder. OK, I understand that with the builder I may have additional control over how my object represents itself, but is that iota of control worth the increased complexity in the code base? Changes to the structure of MyObject will still likely result in downstream dependencies requiring updates, so I'm not entirely sure if much have been gained here (except that there are technically no getters).

Is this a case where you simply have to choose your poison? Or is there a magical middle ground that reduces all these issues? What are the common strategies for dealing with this?

Best Answer

Is this a case where you simply have to choose your poison?

Yes, basically.

One of software developer's job is risk management when it comes to change. And there is lots of theory about risk management. Generally, you should first identify risks. You already did that : change of object itself, change of XML, change of POJO, etc.. All of those are possible risks. Next step would be to identify what are probabilities of the occurring. You can get some idea from looking back at changes you had to do in the past, but only real experience can help you here. Then, you have to identify impact of each risk. You say that changes can propagate through the dependencies, so the more dependencies the risk affects, the more costly it is when it occurs. Taking all of this in, you should make a design that minimizes risks that have high probability and high cost.

I can't tell you right now, because the way you described your problem is way too generic. And different systems would encounter different risks at different probabilities and costs. For example, if XML is used as communication between your systems, then updating it's schema might not be a problem. But if it is used for persistence, then maintaining backwards compatibility is a must. Etc..

Only thing I can suggest is to get more experience. More experience will give you more ideas about possible risks and their properties and ways to design to avoid those risks.

Related Solutions

Single Responsibility Principle – Does Adding a Return Type to an Update Method Violate It?

As with any rule, I think the important thing here is to consider the purpose of the rule, the spirit, and not get mired in analyzing exactly how the rule was worded in some textbook and how to apply that to this case. We don't need to approach this like lawyers. The purpose of the rules is to help us write better programs. It's not like the purpose of writing programs is to uphold the rules.

The purpose of the single-responsibility rule is to make programs easier to understand and maintain by making each function do one self-contained, coherent thing.

For example, I once wrote a function that I called something like "checkOrderStatus", that determined if an order was pending, shipped, back-ordered, whatever, and returned a code indicating which. Then another programmer came along and modified this function to also update the quantity on hand when the order was shipped. This severely violated the single responsibility principle. Another programmer reading that code later would see the function name, see how the return value was used, and might well never suspect that it did a database update. Someone who needed to get the order status without updating the quantity on hand would be in an awkward position: should he write a new function that duplicates the order status part? Add a flag to tell it whether to do the db update? Etc. (Of course the right answer would be to break the function in two, but that might not be practical for many reasons.)

On the other hand, I wouldn't nitpick what constitutes "two things". I just recently wrote a function that sends customer information from our system to our client's system. That function does some reformatting of the data to meet their requirements. For example, we have some fields that may be null on our database, but they don't allow nulls so we have to fill in some dummy text, "not specified" or I forget the exact words. Arguably this function is doing two things: reformat the data AND send it. But I very deliberately put this in a single function rather than having "reformat" and "send" because I don't want to ever, ever send without reformatting. I don't want someone to write a new call and not realize he has to call reformat and then send.

In your case, update the database and return an image of the record written seem like two things that might well go together logically and inevitably. I don't know the details of your application so I can't say definitively if this is a good idea or not, but it sounds plausible.

If you are creating an object in memory that holds all the data for the record, doing the database calls to write this, and then returning the object, this makes a lot of sense. You have the object in your hands. Why not just hand it back? If you didn't return the object, how would the caller get it? Would he have to read the database to get the object you just wrote? That seems rather inefficient. How would he find the record? Do you know the primary key? If someone declares that it's "legal" for the write function to return the primary key so that you can re-read the record, why not just return the whole record so you don't have to? What's the difference?

On the other hand, if creating the object is a bunch of work quite distinct from writing the database record, and a caller might well want to do the write but not create the object, then this could be wasteful. If a caller might want the object but not do the write, then you'd have to provide another way to get the object, which could mean writing redundant code.

But I think scenario 1 is more likely, so I'd say, probably no problem.

Separation of Concerns – When is it Too Much

Your various examples of splitting out concerns into separate functions all suffer from the same issue: you are still hard-coding the file dependency into get_last_appearance_of_keyword. This makes that function hard to test as it now has to reply on a file existing in the file system when the test is run. This leads to brittle tests.

So I'd simply change your original function to:

def get_last_appearance_of_keyword(text, keyword):
    line_number = 0
    for line in text:
        if keyword in line:
            line_number = line
    return line_number

Now you have a function that has just one responsibility: find the last occurrence of a keyword in some text. If that text is to come from a file, that becomes the caller's responsibility to deal with. When testing, you can then just pass in a block of text. When using it with runtime code, first the file is read, then this function is called. That is real separation of concerns.

Best Answer

Related Solutions

Single Responsibility Principle – Does Adding a Return Type to an Update Method Violate It?

Separation of Concerns – When is it Too Much

Related Topic