Design Patterns – Importance of Not Exposing Internal Representation in Iterator Pattern

design-patternsiterator

I am reading C# Design Pattern Essentials. I'm currently reading about the iterator pattern.

I fully understand how to implement, but I don't understand the importance or see a use case. In the book an example is given where someone needs to get a list of objects. They could have done this by exposing a public property, such as IList<T> or an Array.

The book writes

The problem with this is that the internal representation in both of these classes has been exposed to outside projects.

What is the internal representation? The fact it's an array or IList<T>? I really don't understand why this is a bad thing for the consumer (the programmer calling this) to know…

The book then says this pattern works by exposing its GetEnumerator function, so we can call GetEnumerator() and expose the 'list' this way.

I assume this patterns has a place (like all) in certain situations, but I fail to see where and when.

Best Answer

Software is a game of promises and privileges. It is never a good idea to promise more than you can deliver, or more than your collaborator needs.

This applies particularly to types. The point of writing an iterable collection is that its user can iterate over it - no more, no less. Exposing the concrete type Array usually creates many additional promises, e.g. that you can sort the collection by a function of your own choosing, not to mention the fact that a normal Array will probably allow the collaborator to change the data that's stored inside it.

Even if you think this is a good thing ("If the renderer notices that the new export option is missing, it can just patch it right in! Neat!"), overall this decreases the coherence of the code base, making it harder to reason about - and making code easy to reason about is the foremost goal of software engineering.

Now, if your collaborator needs access to a number of thingies so that they are guaranteed not to miss any of them, you implement an Iterable interface and expose only those methods that this interface declares. That way, next year when a massively better and more efficient data structure appears in your standard library, you'll be able to switch out the underlying code and benefit from it without fixing your client code everywhere. There are other benefits to not promising more than is needed, but this one alone is so big that in practice, no others are needed.

Related Solutions

Design Patterns – What Makes the Iterator a Design Pattern?

Most of the patterns from the GoF book have the following things in common:

they solve basic design problems, using object-oriented means
people often face these kind problems in arbitrary programs, indepently from the domain or business
they are recipes for making the code more reusable, often by making it more SOLID
they present canonic solutions to these problems

The problems solved by these patterns are so basic that many developers understand them mainly as workarounds for missing programming language features, which is IMHO a valid point of view (note that the GoF book is from 1995, where Java and C++ did not offer so many features as today).

The iterator pattern fits well into this description: it solves a basic problem which occurs very often, independently from any specific domain, and as you wrote by yourself it is a good example for "separation of concerns". As you surely know, direct iterator support is something you find in a lot of contempary programming languages today.

Now compare this to the problems you picked:

writing to a file - that is IMHO simply not "basic" enough. It is a very specific problem. Nor is there a good, canonic solution - there are lots of different approaches how to write to a file, and no clear "best practice".
Painter, Encoder: whatever you have in mind with that, those problems look even less basic to me, and not even domain independent.
having the "power" function available for different kind of objects: at a first glance, that could be worth beeing a pattern, but your proposed solution does not convince me - it looks more like an attempt to shoehorn the power function into something similar to the iterator pattern. I implemented a lot of code with engineering calculations, but I cannot remember a situation where an approach similar to your power function object would have helped me (however, iterators is something I have to deal with on a daily basis).

Moreover, I do not see anything in your power function example which could not be interpreted as an application of the strategy pattern or the command pattern, which means those basic parts are already in the GoF book. A better solution might contain either operator overloading or extension methods, but those are things are subject to language features, and that is exactly what the "OO means" used by the "Gang" could not provide.

C# – Should Constructor Parameters Be Exposed or Hidden?

The calling code which instantiated a FooRepository object is passing an IDbConnection object and therefore has the right to access this information later on

This is not true when you're dealing with things like the factory pattern, where the instantiator of the object is not the handler of the object. Factory patterns quite often exist specifically because the object's construction is an implementation detail that should be abstracted away.

This applies to more cases than just the factory pattern. Essentially, it applies to any object that gets passed around at least once.

but can't modify it anymore (no set on the DbConnection property)

This isn't true for reference types. It's true that you can't change which object is being referenced, but you can still alter its content. For example:

public class Foo
{
    public string Name { get; set; }
}

public class Baz
{
    public Foo Foo { get; } // allegedly: "can't modify it anymore"

    public Baz(Foo foo)
    {
        this.Foo = foo;
    }
}

var myFoo = new Foo() { Name = "Hello" };
var myBaz = new Baz(myFoo);

As per your claim, myBaz.Foo can no longer be modified. Yet this code is perfectly legal:

myBaz.Foo.Name = "a completely different name";

And that's still a risk you take.

he told me that any class I write, it should expose just the minimum useful information.

I don't want to think few minutes for each parameter to determine if this would be a good idea to expose it or not.

These two don't quite follow. It doesn't require you to think about it, it requires you to default to private instead of public like you currently do. Unless there is a valid reason to expose it, don't.

This is an oversimplification as there are cases where you shouldn't start out on private (e.g. DTO properties), but if you're still struggling with evaluating this, it's already better to default to private instead of public.

In my opinion, there are some use cases we simply can't think of when we first write a new class.

In my opinion, this is indicative of not quite understanding the class' responsibility and how it fits in the existing codebase.

In fact, that's sort of what you state in the question: you don't want to think about it. But you really should. For your example, what would ever be the purpose of a repository exposing its database connection? I can't think of any answer here that does not immediately violate good practice rules, can you?
Exposing the database connection is not part of the repository's purpose, which is all about providing access to a persistent data store.

In part, this is a matter of experience which will come over time. Every time you have to change the access modifier on an existing property/method is a time to learn why the previous choice was not the right one. Do it enough and you will improve at judging public contracts on the first design.

In my opinion, there are some use cases we simply can't think of when we first write a new class.

Don't forget OCP: "software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification".

If you are inherently accounting for needing to change the internals of classes as time passes, you're taking a stance orthogonal to OCP.

That's not to say internals can't be changed when e.g. bugs are found or breaking changes are implemented; but it does mean you should try to avoid it as best as you can. Changing existing (often central) logic is a most common source of bugs, especially the crippling ones.

Whether or not the class will finally be included in a Nuget package doesn't really matter.

It really does matter. If your library is only being used in the same solution file, you can change things very quickly to your needs and can confirm it's working with a simple build.

But Nuget compounds the issue. If you change the contracts of your classes exposed in yout Nuget package, that means that every Nuget consumer will have to deal with breaking changes.
From personal experience, the issue is further compounded by Nuget servers not keeping a record of who has consumed your Nuget package, which makes it hard to figure out who all your consumers are and warn them ahead of time that breaking changes are about to be released.

Had you defaulted to making things private, and then selectively expose them, there would be less of a problem here. Adding to the contract without changing the existing parts does not break existing code.
Removing things from the contract, which is what would happen if you default to public, would always be liable to breaking code that depends on the thing you're now removing from the contract.

Should I really think for each parameter if it makes sense to expose it?

Yes. But it's not as complicated as you're making it out to be. Understanding what a certain class needs to expose or not is something you need to think about once per class. What is this class' purpose? How do I want this class to be used by its consumers?

After that, all properties/methods that you develop can easily be matched to the class' purpose, which is not a new evaluation but simply applying the decision you already made.

Or is there a design pattern I can just instinctively apply without wasting too much time?

If you were using interfaces on all your classes and using interface-based dependency injection, it would really help you in understanding how to separate a class' contract (things in the interface) from its implementation (things not in the interface).

Take for example:

public interface ISodaVendingMachine
{
    Soda GetDrink();
}

public class RegularVendingMachine : ISodaVendingMachine
{
    private Drinks drinks;

    public RegularVendingMachine(Drinks drinks)
    {
        this.drinks = drinks;
    }

    public Soda GetDrink()
    {
        return this.drinks.TakeOne();
    }
}

public class ConjuringVendingMachine : ISodaVendingMachine
{
    private PhilosophersStone philosophersStone;

    public ConjuringVendingMachine(PhilosophersStone philosophersStone)
    {
        this.philosophersStone = philosophersStone;
    }

    public Soda GetDrink()
    {
        return philosophersStone.PerformIncantation("Drinkum givum");
    }
}

The internals of each vending machine is up to them. It doesn't matter how they have access to and dispense a drink to the consumer. To the consumer, that's an irrelevant implementation detail. The customer doesn't want to know how the sausage gets made.

What matters for the public contracts is that they dispense a drink to the consumer, and thus the ISodaVendingMachine interface is built specifically for that purpose.

Notice how the interface doesn't care about anothing other than what it was designed to ensure.

When you have that interface, you can already see that anything in your class that isn't part of that interface should most likely be private as it is an implementation detail, not a contract.

Best Answer

Related Solutions

Design Patterns – What Makes the Iterator a Design Pattern?

C# – Should Constructor Parameters Be Exposed or Hidden?

Related Topic